(aiittasiy 


DEPARTMENT OF ELEMENTARY & SECONDARY 


= 
| E D U CAT | O N 1™ Chris L. Nicastro, Ph.D. * Commissioner of Education 


205 Jefferson Street, P.O. Box 480 « Jefferson City, MO 65102-0480 » dese.mo.gov 


September 30, 2014 


The Honorable Tom Dempsey 

President Pro Tem, Missouri State Senate 
201 West Capitol Avenue 

State Capitol Building, Room 326 
Jefferson City, Missouri 65101 


The Honorable Timothy Jones 

Speaker, Missouri House of Representatives 
201 West Capitol Avenue 

State Capitol Building, Room 308 

Jefferson City, Missouri 65101 


Dear Senator Dempsey and Representative Jones: 


Section 160.526, RSMo., requires that the Commissioner of Education inform the President Pro 
Tem of the Senate and the Speaker of the House about the procedures to implement the statewide 
assessment system, including a report related to the reliability and validity of the assessment 
instruments, at least six months prior to the implementation of the statewide assessment system. 
In compliance with that statute, | am pleased to provide the following information about updates 
to the Missouri Assessment Program for the 2014-2015 school year. 


For the past several years, the Department of Elementary and Secondary Education has been 
developing updated high quality assessments in English language arts and mathematics. In 
compliance with HB1490, signed by Governor Nixon in July, these assessments will be 
implemented in the spring of 2015. As prescribed by the legislature, in order to ensure that all 
Missouri high school students graduate college and career ready, the Department of Elementary 
and Secondary Education is committed to implementing a reliable and valid assessment that will 
assess the knowledge, skills, and competencies called for in The Show-Me Standards. We have 
consulted with national experts and involved Missouri teachers in the development process. 


The State Board of Education approved the following Missouri Assessment Program for 2014- 
2015. 


Grade-Level Assessments 


e All grade-level English language arts, mathematics, and science summative assessments 
will be delivered online. 


Phone 573-751-4446 » Fax 573-751-1179 « commissioner@dese.mo. gov 


Senator Dempsey and Representative Jones 
Page 2 
September 30, 2014 


e The English language arts and mathematics summative assessments for grades 3 
through 8 will use a Computer Adaptive Test (CAT) blueprint. The mathematics 
summative assessment for grades 3 through 8 will consist of approximately 33 items. 
The grades 3 through 5 English language arts assessments will consist of approximately 
43 items and Grades 6 through 8 English language arts will consist of approximately 44 
items. Grades 5 and 8 will take summative English language arts and mathematics 
assessments that will also include a performance task in addition to the CAT portion of 
the assessments. 

e A digital library of formative assessment resources will be available to Missouri 
educators at no cost to school districts and charter schools. 

e English language arts and mathematics interim assessment resources will be available 
to all Missouri public schools. 


MAP-Alternate (MAP-A) Grade-Level Assessment for students with significant cognitive 
disabilities 

e English language arts and mathematics MAP-A assessments will be online, computer- 
adaptive assessments. 

e An instructionally embedded assessment model will be used for the MAP-A English 
language arts and mathematics. The comprehensive assessment system is designed to 
support student learning and to more validly measure what students with the most 
significant cognitive disabilities know and can do. 


End-of-Course (EOC) Assessments 
e English language arts and mathematics EOC assessments have been updated to reflect 
the Missouri Learning Standards. 
e A Physical Science EOC has been added to the Science subject area. 


College and Career Readiness 
e The administration of the ACT® Plus Writing to all public school 11" grade students at 
no cost in order to provide a statewide measure of college and career readiness. 


Validity is the overarching component of the Missouri Assessment Program. The following 
excerpt is from the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 
1999): 


Ultimately, the validity of an intended interpretation of test scores relies on all the available evidence 
relevant to the technical quality of a testing system. This includes evidence of careful test 
construction; adequate score reliability; appropriate test administration and scoring; accurate score 
scaling, equating, and standard setting; and careful attention to fairness for all examinees. 


Senator Dempsey and Representative Jones 
Page 3 
September 30, 2014 


The validity and reliability of an assessment begins with the foundational content expectations, 
the development of items assessing the content, and the validation by educators that the items are 
accessible, fair, and representative of the content expectations. Additional evidence of validity and 
reliability is provided by item trials, pilots, and finally the official field testing of assessment items. 
The items are reviewed statistically for performance and fairness to all students. As the items 
move into operational administration, their performance is monitored to ensure that items 
continue to perform in a way that is fair to all students. The reporting and use of the data 
generated by assessments builds the final piece of the validity argument. 


As an example, the following is an overview of the validation plan for the grade level assessment 
resources. 


The purposes of the summative assessments are to provide valid, reliable, and fair 
information about: 

e Students’ English language arts and mathematics achievement with respect to those 
English language arts and mathematics Missouri Learning Standards measured by the 
English language arts and mathematics summative assessments. 

e Whether students have demonstrated sufficient academic proficiency in English 
language arts and mathematics to be on track for achieving college and career 
readiness. 

e Students’ annual progress toward college and career readiness in English language arts 
and mathematics. 

e How instruction can be improved at classroom, school, district, and state levels. 

e Students’ English language arts and mathematics proficiencies for federal and state 
accountability purposes and potentially for local accountability systems. 

e Students’ achievement in English language arts and mathematics across students and 
subgroups of students. 


The purposes of the interim assessments are to provide valid, reliable, and fair 
information about: 
e Student progress toward mastery of skills. 
e Students’ performance at the content cluster level, so that teachers and administrators 
can monitor student progress throughout the year and adjust instruction accordingly. 
e Individual and group (e.g., school, district) performance at the claim level in English 
language arts and mathematics, to determine whether teaching and learning are on 
target. 
e Student progress toward the mastery of skills measured in English language arts and 
mathematics across all students and subgroups of students. 


Senator Dempsey and Representative Jones 
Page 4 
September 30, 2014 


The purposes of the formative assessment resources are to provide measurement tools 
and resources to: 


Improve teaching and learning. 

Monitor student progress throughout the school year. 

Help teachers and other educators align instruction, curricula, and assessment. 
Help teachers and other educators use the summative and interim assessments to 
improve instruction at the individual student and classroom levels. 


We appreciate the support of the Missouri General Assembly in our endeavors and look forward to 
working with you in the future to raise the level of performance of students in Missouri's public 
schools. 


If after reviewing the report you have any questions, please give us a call. 
Sincerely, 

Chris L. Nicastro 

Commissioner of Education 


Attachment 


c: State Board of Education 


Missouri Assessment 
Program 
Updates And 
Changes 


September 2014 2014-201 5 


It is the policy of the Missouri Department of Elementary and Secondary Education not to 
discriminate on the basis of race, color, religion, gender, national origin, age, or disability in its 
programs or employment practices as required by Title Vl and VII of the Civil Rights Act of 1964, 
Title IX of the Education Amendments of 1972, Section 504 of the Rehabilitation Act of 1973, the 
Age Discrimination Act of 1975 and Title Il of the Americans with Disabilities Act of 1990. 
Inquiries related to Department programs and to the location of services, activities, and facilities 
that are accessible by persons with disabilities may be directed to the Jefferson State Office 
Building, Office of the General Counsel, Coordinator—Civil Rights Compliance (Title VI/Title 
IX/504/ADA/Age Act), 6" Floor, 205 Jefferson Street, P.O. Box 480, Jefferson City, MO 65102- 
0480; telephone number (573) 526-4757 or TTY (800) 735-2966, fax (573) 522-4883, email 


civilrights@dese.mo.gov. 
Page 1 of 39 


VIII. 


Table of Contents 
Legislation and Policy Directives 
Establishing a Foundation for the Missouri Assessment Program 
About the Missouri Assessment Program 
Developing the Missouri Assessment Program 
Administering and Scoring the Missouri Assessment Program 
Reporting and Using the Results of the MAP 
Technical Considerations 
A Final Note 


Appendices 


Page 3 
Page 4 
Page 6 
Page 9 
Page 13 
Page 15 
Page 17 
Page 18 


Page 19 


Page 2 of 39 


[. 
Legislation and Policy Directives 


History of the Missouri Assessment Program 


The Missouri Assessment Program (MAP) is designed to measure how well students acquire the skills 
and knowledge described in Missouri's Show-Me Standards — delineated in grade-levels, content areas, 
and courses within the Missouri Learning Standards (MLS) — in order for all high school graduates to be 
college and career ready. The assessments yield information on academic achievement at the student, 
class, school, district, and state levels. This information is used to diagnose individual student strengths 
and weaknesses in relation to instruction aligned to the MLS and to gauge the overall quality of 
education throughout Missouri. 


The MAP traces its origin to the 1993 Outstanding Schools Act. This act required Missouri to create a 
statewide assessment system that measured challenging academic standards. Additionally, the State 
Board of Education directed the Missouri Department of Elementary and Secondary Education (the 
Department) to identify the knowledge, skills and competencies that Missouri students should acquire 
by the time they complete high school and to assess student progress toward those academic standards. 
The Department worked with teachers, school administrators, parents and business professionals from 
throughout the state to develop the “Show-Me Standards.” From this act, grade-span assessments were 
created that measured Missouri's Show-Me standards. Originally, MAP was designed to be a grade-span 
test: Grades 3, 7, and 11 in Communication Arts, Grades 4, 8, and 10 in Mathematics, and Grades 3, 7, 
and 10 in Science. 


In 2001, the federal No Child Left Behind (NCLB) legislation was enacted. In accordance with the NCLB 
legislation, student performance, reported in terms of proficiency categories, is used to determine the 
adequate yearly progress (AYP) of the school, district, and state using student performance results from 
the Missouri Assessment Program. NCLB required states to develop grade-level tests in both Reading 
and Mathematics to be administered in Grades 3 through 8 and once in high school. It also required that 
states have in place Science assessments to be administered at least once in Grades 3 through 5, once in 
Grades 6 through 9, and once in high school. 


Beginning with the 2008-2009 school year, Missouri began administering End-of-Course (EOC) 
assessments in lieu of High School grade-level assessments. EOCs were implemented because they 
connected Missouri’s high school accountability assessments directly to courses students take and they 
allowed school districts and charter schools to hold high school students accountable locally for their 
performance on the assessments. Algebra |, English Il and Biology were the first EOCs administered. The 
following year, Government, American History, English |, Algebra Il and Geometry became operational. 
The move to EOC assessments was also a move to online testing. In the first few years of EOC, districts 
had a choice between online testing and traditional paper/pencil assessments. EOCs moved fully online 
beginning in the fall of 2010. 


The 2014-2015 school year is another time of transition for the Missouri Assessment Program. Grade- 
Level assessments in English language arts and mathematics in grades 3-8 and science in grades 5 and 8 
will be administered fully online for the first time. In addition the English language arts and mathematics 
Grade-Level and EOC assessments will be aligned to the updated English language arts and mathematics 
Missouri Learning Standards. 


Page 3 of 39 


ll. 
Establishing a Foundation for the 
Missouri Assessment Program 


The Show-Me Standards 


The Show-Me Standards provide the basis for the MAP. These standards are designed to ensure that 
high school graduates can lead productive, fulfilling and successful lives as they continue their 
education, enter the work force and assume civic responsibilities. They set high expectations for learning 
and instruction, and encourage the development of challenging curricula in schools throughout the 
state. Beginning with the passage of the Outstanding Schools Act in 1993, hundreds of Missouri 
teachers, parents, and business professionals participated in the process of developing the Show-Me 
Standards, which were approved by the State Board of Education in January 1996. 


There are a total of 73 Show-Me Standards (Appendix A). Forty of these are knowledge (content) 
Standards, intended to delineate a solid foundation of knowledge and skills in the traditional subject 
areas (reading, writing, mathematics, world and American history, government, geography, science, 
health, physical education, and fine arts). The remaining 33 standards are process standards that require 
students to demonstrate and apply their content knowledge in a variety of situations. The process 
standards are grouped under four broad goals that are relevant to all content areas. 


The Show-Me Standards were developed with the understanding that in order to be successful and 
productive, Missouri's students must have a solid foundation of knowledge and skills, as well as the 
ability to apply their learning to the kinds of problems and situations they will encounter after 
graduating from high school. The standards promote the concept that active, hands-on learning will 
benefit students of all ages. When basic knowledge and skills are integrated and applied in practical and 
challenging ways across subject areas, learning becomes engaging and motivating. Such learning stays in 
the mind long after tests are over and classrooms are left behind. 


Although the Show-Me Standards define concepts that are significant to success in school, society and 
the workplace, they do not specify everything students should learn in school. The Show-Me Standards 
provide a solid foundation on which districts can build a challenging curriculum that will help all students 
reach their maximum potential. 


Missouri Learning Standards 
School districts must ensure that their curricula and instructional programs address the Show-Me 


Standards. The Missouri Learning Standards (MLS) were developed to provide guidance to school 
districts in this process. The Missouri Learning Standards represent an update to the Grade-Level 
Expectations (GLEs) and Course-Level Expectations (CLEs), which were developed in response to the 
move to grade-level assessments in mathematics and reading required by NCLB and implemented in the 
2005-2006 school year. The GLEs and CLEs explicate the Show-Me Standards, providing specific targets 
for instruction and assessment. The MLS continue the precedence set by the GLEs and CLEs, by 
delineating the knowledge, skills, and abilities that students need to acquire at each grade level and/or 
course to progress toward the Show-Me Standards and college and career readiness along with targets 
for the statewide summative assessments. Missouri educators contributed to the development and 
review of the Missouri Learning Standards (MLS) to ensure that they reflect the realities of the 
classroom. The MLS provide the minimum content expectations for each grade and course andprovide 
the clarity and consistency teachers need to make sure their students are on track and equipped with 
the knowledge and skills they need for success. 


Page 4 of 39 


The MLS do not tell teachers how to teach, but rather establish the minimum of what students need to 
learn. It is up to schools and teachers to decide how to best help students reach the standards. It is also 
important to note that the standards do not include everything that could or should be taught. Local 
districts are still designing their own curriculum and choosing which texts to read, along with a long list 
of other local education choices. 


Additionally, the Missouri Learning Standards: 


Help colleges and professional development programs better prepare teachers. 

Establish a foundation for educators to work collaboratively with their peers to develop and share 
resources, expertise, curriculum tools and professional development. 

Guide educators toward curriculum and teaching strategies that will give students a deep 
understanding of the subjects and skills they need to learn. 


The MLS take a spiraled approach, where key knowledge and concepts are introduced at early grade 
levels and then further developed in later grades. The content expectations encourage the integration of 
key concepts across subject areas. Important knowledge and skills traditionally taught in one subject 
area are reinforced and developed in other areas, broadening students’ perspectives and understanding. 


The Missouri Learning Standards are not a statewide mandated curriculum. Rather, they provide a 
framework for local curriculum development. The purpose of the MLS is to provide support for districts 
as they develop local curriculum guides that address the standards. Missouri law ensures local control of 
education. Each school district determines how its curriculum is structured. School districts have already 
developed curriculum guides reflecting the local approach to all students being college and career ready 
as envisioned by the Show-Me Standards. 


HB 1490 of 2014 

In accordance with HB 1490, which was passed in the 2014 legislative session and signed by Governor 
Nixon, advisory groups consisting of parents, classroom teachers and other education professionals, and 
career and technical education representatives are studying the learning standards and academic 
performance standards for the state. 


Theses advisory work groups are reviewing the current learning standards in English language arts, 
mathematics, science, and history and government. For each subject area, there are two work groups; 
one for grades K-5 and another for grades 6-12. Members of the work groups were chosen by the 
Missouri Senate President Pro Tem and Speaker of the House, the Governor and Lieutenant Governor, 
the Commissioner of Higher Education, and the State Board of Education. 


The work group will present suggested recommendations to learning standards to the State Board of 
Education in October 2015. Should the State Board of Education approve the learning standards 
recommendations, the updated learning standards will be implemented no earlier than the 2016-2017 
school year. New assessments based on the standards will need to be developed. 


Page 5 of 39 


Il. 
About the Missouri Assessment Program 


Subject Areas and Grade Levels for Statewide Assessment 


The Missouri Assessment Program (MAP) has statewide assessments that cover the following grades 
and content in the Show-Me Standards—mathematics, communication arts, science, and social studies. 
e English language proficiency grades K-12 — ACCESS for ELLs 
e Grades 3-8 English language arts and mathematics 
e Grades 3-8, and 11 MAP-Alternative (MAP-A) English language arts and mathematics integrated 
yearlong assessment program for students with sever cognitive disabilities 
e Grades 5 and 8 science 
e Grades 5, 8, and 11 MAP-Alternative science for students with sever cognitive disabilities 
e Algebra l, Geometry, Algebra Il, English I, English Il, American History, Government, Biology, 
Physical Science, and Personal Finance end-of-course assessments 
e 11" grade ACT® Plus Writing 


Item Types 


Each state assessment may include up to four types of items: multiple-choice items, technology- 
enhanced items, constructed-response items and performance events/tasks. 


Multiple-choice items present students with a question followed by four or five response options, one 
of which is correct. The advantages of these items are: 1) they are effective in measuring students’ 
content knowledge and understanding; and, 2) a large number of these items can be administered and 
scored in a short amount of time, so that a wide range of knowledge and skills can be tested. The major 
limitation of multiple-choice items is that they do not adequately measure students’ ability to apply 
what they know. 


Technology-enhanced items are computer-delivered items that include specialized interactions for 
collecting response data. These include interactions and responses beyond traditional selected-response 
or constructed-response. The advantages of these items are: 1) they are effective in measuring students’ 
content knowledge and understanding; 2) they allow students to demonstrate what they know in an 
authentic way; and, 3) they may be scored electronically. 


Constructed-response items require students to supply (rather than select) an appropriate response. 
Students might be asked to provide a one-word answer, complete a sentence or show their work in 
solving a problem. In addition to measuring students’ content knowledge, constructed-response items 
can provide some information about how students arrived at their answers. These items are more time 
consuming than multiple-choice items to administer and score; however, they provide more information 
about students’ understanding and thinking. 


Performance events and performance tasks measure students’ knowledge, and their ability to apply 
that knowledge in problem situations. Performance events and tasks are one type of performance 
assessment. In its purest form, performance assessment requires students to demonstrate what they 
know. This type of assessment has been used for years in schools (e.g., band, business courses, and 
drivers’ education). 


Page 6 of 39 


The performance events and tasks used in the MAP may require a student to work through a 
complicated problem, or present a written argument. Depending on the demands of the task, students 
should be able to complete an event in 35 to 120 minutes depending on the grade, course, and or 
content area. Performance events and tasks often allow for more than one approach to get to a correct 
answer. The advantage of this type of assessment is that it provides insight into a student’s ability to 
apply knowledge and understanding in various situations. The disadvantage is that performance events 
and tasks are often time consuming and costly to administer and score. 


Inclusion of Special Populations 


Throughout the development of the MAP, inclusion of special populations has been a goal. 
Accountability for all students was mandated through the Reauthorization of the Individuals with 
Disabilities Act (IDEA) 2004. Assessment development activities include educators with experience with 
disabilities, soecial education needs, and English language learners (ELL). Field tests in math and English 
language arts have included these students with special recruitment targets in order to ensure that 
special populations of students will be able to access the items and that the items are fair to all 
students. 


The accommodations for the Grade-Level and End-of-Course assessments have been updated beginning 
with the fall 2014 EOC administration. With the move to online assessments for grades 3-8 many paper 
and pencil assessments accommodations are now part of the user interface. Online testing provides 
supports to students in three areas: Universal Tools, Designated Supports and Accommodations. 


e Universal Tools are available to all students taking a Grade-Level or End-of-Course assessment. 
Examples of Universal Tools in the online administration platform include expandable passages, 
a highlighter, keyboard navigation, a mark for review function, a notepad, a protractor, a ruler, 
and spell check among other functionalities based on item type. 

e Designated Supports are available to students when deemed appropriate by a team of 
educators. Examples of Designated Supports in the online administration platform include color 
contrast, color overly, magnification, masking, scribing, and turning off universal tools among 
other functionalities based on item type. 

e Accommodations must appear in an IEP/504 plan and include American Sign Language, 
alternative response options (adapted keyboards, sticky keys, MouseKeys, etc.), multiplication 
tables, and speech-to-text among many other accommodations. 


For Special Education students, the IEP team should choose all of the designated supports and 
accommodations a student will receive. Some designated supports and accommodations are only for 
ELL students. 


The list of Grade-Level Accommodations, Supports and Tools may be found in Appendix B and the list of 
EOC Accommodations, Supports and Tools may be found in Appendix C. 


ACT provides a variety of approved accommodations for students with IEPs and 504 plans. Assessments 
administered using ACT approved accommodations will result in college reportable ACT® scores. A 
student's IEP should reflect appropriate ACT-approved accommodations that will allow the student 
access to test content. High school ACT Test Accommodations Coordinators will receive training from 
ACT in fall 2014 to assist them in navigating the accommodations approval process. 


Page 7 of 39 


Sample and Practice Assessment Items 


The Department has traditionally provided practice and sample items to Missouri educators. This 
practice continues as the assessment program is updated for the 2014-2015 school year. The following 
table provides the type of sample and practice items available during the year to Missouri educators for 
professional development and use with students. 


ere | PES | Nee | esesmens 

Items Tests Assessments 
fswendmaensise | | || 
arts and mathematics 


Grades 3-8 MAP-A English 


language arts and 
mathematics 


Appendix D contains a sample of items from the Missouri Assessment Program. 


Page 8 of 39 


IV. 
Missouri Assessment Program Test Development 


Grades 3-8 English Language Arts and Mathematics Development 


A valid process of test development requires adherence to best practices, making possible the creation 
of tests that measure what is intended for a specified population. The various activities and analyses in 
the Test Development Phase can be viewed as establishing “Process Validity.” Evidence that the 
assessments have met Process Validity requirements is found in the documentation of key development 
activities (listed more or less in order of sequence): 


e Establishment of a validity framework, as outlined in our Research Agenda Recommendations 
located in Appendix E, to guide test development and ongoing research 

e Development of a comprehensive set of content specifications based on the Missouri Learning 
Standards 

e Development of test blueprints that specify number and types of questions to be presented to 
students 

e Development of task models for items and stimuli that guide the writing and review of individual 
test items and passages 

e Content and bias/sensitivity reviews, to ensure that items and passages are aligned to MLS 
content, are consistent with evidence statements of the content specifications, and are not 
biased in favor of or against students from different cultural and demographic backgrounds 

e Cognitive lab on item types and accommodations to uncover issues and opportunities with 
modes of presentation, tools, and other item features. The final cognitive lab report may be 
found in Appendix F 

e Establishment of a comprehensive accessibility framework and guidelines, found in Appendix G, 
to ensure that the assessments are accessible to the widest possible array of students 

e Small-scale trials to investigate the use and feasibility of different item types. The final small- 
scale trials report may be found in Appendix H 

e The procurement of software for computer adaptive test delivery, which is used to select items 
to be presented to students on the basis of both meeting the test blueprint and selecting items 
that maximize the accuracy of the student’s score. The technical requirements for the system 
may be found in Appendix | 

e Implementation of large-scale pilot tests in order to collect data on the initial performance of 
items and the testing platform software. The results of the pilot test may be found in Appendix J 

e Implementation of large-scale field tests, described in Appendix K, to collect data on all items to 
evaluate their technical adequacy and their placement on a continuous growth scale from 
grades 3 through 11 

e Analysis of the alignment, described in Appendix L, among all components of the assessment 
design, and ultimately between the Missouri Learning Standards and the tests students actually 
take 

e Establishment of internal validity or the degree to which the test functions as required, has 
sufficient reliability, and sufficient ability to measure the intended content and not unintended 
content. Internal validity was investigated using Pilot Test results to determine whether or nota 
given content area test (ELA or mathematics) measured the intended construct and not 
unintended constructs. Essentially, this is an investigation as to whether or not the test is 
measuring primarily one construct (i.e., if it is uni-dimensional). As indicated in the 
dimensionality paper, included as Appendix M, the evidence strongly suggests that the ELA and 


Page 9 of 39 


mathematics test are uni-dimensional. Test reliability will initially be modeled through 
simulations using the item pool after item review, which is due to be completed December 31, 
2014. Operational test reliability will be reported in the technical manual following the first 
operational administration in spring 2015. 


A total of one hundred and twenty-nine Missouri educators were selected to participate in the 
development of the 2014-2015 grade-level assessments. Eighteen Missouri educators were selected to 
serve as item writers along with fifty-two Missouri educators who selected to serve as reviewers. Eleven 
Missouri educators were selected to serve on content and bias review committees. 


Two hundred and sixty-nine Missouri school districts and charter schools had the opportunity to 
participate in the spring 2013 item type pilot and two hundred and sixty-seven school districts 
participated in the spring 2014 item field test. The full list of participating school districts and charter 
schools that had the opportunity to participate in the spring 2013 pilot test may be found in Appendix N 
and the those school districts and charter schools that participated in the spring 2014 field test may be 
found in Appendix O. 


Preliminary spring 2015 test designs and blueprints may be found in Appendix P. 


MAP-A English Language Arts and Mathematics Assessments 


Missouri educators, especially those from the Missouri State Schools for the Severely Handicapped, have 
served as item writers, and as item and passage reviewers during the development of the new English 
language arts and mathematics MAP-A assessments used with the most significantly challenged 
students in the state. Using principles of evidence-centered design, item writers with expertise in English 
language arts, mathematics, and instruction for students with significant cognitive disabilities, 
developed testlets. Each testlet contains an engagement activity and three to seven questions. 


Every testlet goes through multiple rounds of review by testing vendor assessment development staff, 
internal item reviewers, editors, and educators who serve as external reviewers. Each reviewing group is 
carefully trained to look for potential problems with the academic content, accessibility issues, and 
concerns about bias or sensitive topics. Staff review results from field tests to determine which testlets 
meet quality standards and are ready for operational assessment. 


About 220 Missouri educators took part in the development of the 2014-2015 MAP-A assessments. 
Approximately 20 Missouri educators served as item writers along with about 177 Missouri educators 
serving as item reviewers. 


192 school districts participated in the spring 2014 item field test. The full list of participating school 
districts, charter schools and school buildings may be found in Appendix Q. 


Preliminary test designs and blueprints may be found in Appendix R. 


English Language Arts and Mathematics End-of-Course Assessments (EOCs) 


In September 2007 and June 2008, Riverside Publishing conducted the first round of item-writing 
workshops to develop selected response (SR) items for English Il and Algebra | as well as writing prompts 
for English Il and PEs for Algebra I. These workshops were conducted at the Assessment Resource Center 
(ARC) in Columbia, Missouri and followed establish assessment industry best practices. Participants in 


Page 10 of 39 


the workshops included Missouri educators, DESE staff, and Riverside Publishing TDSs. The workshops 
were held over a five-day period and were conducted with 15-20 teacher participants per content area. 
Teacher participants were selected by DESE to represent school districts throughout Missouri. 
Requirements to be an item writer included experience in classroom teaching and expert content 
knowledge. The content developed at the workshops was based on the Missouri Show-Me Standards 
and Course-Level Expectations. 


The English Il participants wrote selected response items associated with the passages that had been 
developed prior to the item-writing workshops. The Algebra | participants wrote SR items and PEs along 
with scoring guides. 


During the item-writing workshops, Riverside Publishing Test Development Specialists (TDSs) conducted 
training sessions with the item writers and provided instructions on avoiding bias and stereotyping of 
groups and individuals on the basis of gender, race, ethnicity, religion, age, language, socioeconomic 
group, and disability. Riverside Publishing TDSs also trained item writers to write items that adhere to 
the principles of universal design, making the items accessible to the widest range of students. For 
example, items and passages were written using clear and concise language, and all art, graphs, and 
tables were labeled and were not overly crowded with extraneous information. Instruction was also 
provided on developing items at particular cognitive levels based on Norman Webb’s Depth of 
Knowledge (DOK) levels. As items were produced, they were continuously reviewed, revised, edited, and 
evaluated by Riverside Publishing TDSs and DESE staff. Item writers who generated high-quality work on 
or ahead of schedule were given additional assignments. 


All items and passages went through several rounds of internal reviews, including content and editorial 
reviews. Riverside Publishing TDSs reviewed each item with respect to alignment, clarity, and 
correspondence with item specifications. 


Following item writing, twenty Missouri educators participated in a content and bias review process for 
each content area. The committee members read and reviewed each item. Discussions were held about 
whether the items met the criteria listed above. The committees then rejected or revised any items they 
deemed unsatisfactory. If there was disagreement about how to proceed with an item, the Riverside 
Publishing facilitator polled the group and followed the direction of the majority. Approximately 95% of 
the items were accepted (as-is or with edits) by the content and bias committees. 


Similar workshops and reviews were subsequently held for the rest of the EOC assessments. 


In 2012 and 2013 the English language arts and mathematics EOC item banks were aligned to the 
Missouri Learning Standards by panels of Missouri educators led by the Department’s Directors of 
English Language Arts and Mathematics. The content directors then worked with the Department’s 
testing vendor, CTB/McGraw-Hill's, test development specialists to design a test blueprint for the 
mathematics and English language arts EOCs similar to the previous assessments, but aligned to the 
Missouri Learning Standards. The test design and blueprints may be found in Appendix S. 


Physical Science EOC 


During the spring 2013 legislative session an increase in the assessment appropriation included money 
targeted for the development and launch of an additional science EOC. The Department partnered with 
the lowa Testing Program located at the University of lowa to develop a test blueprint and lease items 
to populate a Missouri unique EOC aligned to Missouri’s physical science Missouri Learning Standards. 


Page 11 of 39 


The assessment is an EOC that may be used at school district and charter school discretion. The test 
design and blueprints may be found in Appendix S. 


Page 12 of 39 


V. 
Administering and Scoring the Missouri Assessment Program 


Administering the State Assessment 


The 2014-2015 Missouri Assessment Program brings additional resources to Missouri public and charter 
school educators. A digital library of formative resources and interim assessments for grade-level English 
language arts and mathematics will be available for educators to use as appropriate based on their local 
curriculum and instructional plan. These resources are provided at no charge to school districts and 
educators through the generosity of the Missouri Legislature’s appropriation and the Governor’s 
support. The interim assessments will be available in the same platform as the summative assessment to 
allow for complete educator and student comfort with the switch to online testing. The interim 
assessment data will be available only to school districts and charter schools. 


Missouri school districts and charter schools will administer the MAP in designated grade levels, content 
areas, and courses as they have in the past. All assessments will be administered online. Testing 
windows may be found in Appendix T. 


Scoring the State Assessments 


Several methods will be used to score the different components of state assessment. Multiple-choice 
and technology enhanced items will be machine-scored. Constructed-response items and performance 
events and performance task will be hand-scored by human readers. To ensure that the state 
assessments are scored quickly, and that the results are returned to districts in a reasonable amount of 
time, students’ responses to the constructed-response items and performance events will be read by 
professional item scorers. This scoring will be organized and conducted by Missouri’s assessment 
administration, scoring, and reporting vendors. 


Hand-scoring the assessments is a critical piece in the development of the MAP. Much work is done 
months prior to the scoring activity. During this time, Missouri educators and Department staff, in 
partnership with Missouri's assessment vendors develop and review “score points” to ensure consistent 
grading of the papers. The Department monitors the reliability and validity of this scoring by receiving 
frequent reports from partner vendors reporting interrater reliability. 


There are several steps in the training and qualifying process to score Missouri’s constructed response, 
performance events, and performance tasks. Potential scorers review the scoring guides and exemplar 
student responses. The potential scorers then work through several training rounds to accurately apply 
the scoring guide to specific types of responses. Once training is completed, the potential scorers must 
pass a qualifying round in order to score student papers. 


Several technical methods are used to help maintain the accuracy of individual scorers during the 
scoring process. One method is the use of pre-scored papers which are periodically sent through the 
process without the scorers’ knowledge. In the event that scorers do not score these “check sets” 
accurately, they are retrained, and monitored closely. 


Processing Data and Distributing Results 


With the move to the online administration of grade-level assessments, Missouri public school 
elementary and middle school educators will experience what has become the norm for high school 


Page 13 of 39 


educators in Missouri. State assessment results will be returned quickly after the close of the school 
district’s or charter school’s state testing window. The Department’s testing vendor will return student 
English language arts and mathematics results to school districts within 10 business days of the close of 
the school district’s testing window. The state will receive results in early July 2015 for the assessments 
from our partner vendors. 


Page 14 of 39 


Vi. 
Reporting and Using the Results of the 
Missouri Assessment Program 


Establishing Grade-Level English Language Arts and Mathematics Achievement Levels 


The grade-level English language arts and mathematics achievement levels will be established using a 
multi-step process designed to allow the participation of many Missouri educators. The process is 
designed to ensure that the resulting four achievement levels are valid, reliable, and fair measurements 
of college- and career-readiness for all students. 


Achievement level setting will take place in three phases: 


1. Anonline panel (scheduled for October 6-17) will allow up to 250,000 K-12 educators, higher 
education faculty, parents, and other interested parties to participate virtually in recommending 
achievement levels. 

2. Anin-person workshop (October 13-19) with panels of educators and other stakeholders 
working in grade-level teams will deliberate and make recommendations for the thresholds of 
the four achievement levels. 

3. The vertical articulation committee, a subset of the in-person workshop, will then examine 
recommendations across all grades to consider the reasonableness of the system of cut scores. 


The approach to achievement level setting emphasizes collaboration and transparency to establish a 
consistent means of measuring student progress on the interim and summative assessments. The online 
panel and the in-person workshop will provide an unprecedented opportunity to engage thousands of 
educators and interested stakeholders, raising awareness about the importance and rigor of the 
assessments. The results of the achievement level setting will be presented to the State Board of 
Education for their approval. 


Establishing Grade-Level English Language Arts and Mathematics Achievement Levels 


The a panel of educators with experience working with students with significant cognitive disabilities will 
convene in early June 2015, to establish achievement level cut points. Given the unique nature of the 
assessments and the student population, the achievement level setting teams for the English language 
arts and mathematics MAP-A assessments will be focus on addressing the following questions: 


e What is the acceptable level of mastery certainty to be proficient on a node? 


e When combining information across nodes, what threshold defines proficiency for the linkage 
level? 


e How many linkage levels, across Essential Elements, must be mastered for each performance 
level? 


The results of the achievement level setting will be presented to the State Board of Education for their 
approval. 


Validating 5" and 8" Grade Science Achievement Level Cut Points 


Given the change in the administration format of the 5“ and 8" grade science assessments the 
Department will convene a panel of educators in early June 2015, to review the results of the first 


Page 15 of 39 


statewide online administration of the 5" and 8" grade science assessments. The panelists will review 
the results and the application of the historical achievement level cut points, and will make a 
recommendation to the State Board of Education regarding any adjustments needed to cut points due 
to the administration of the assessments online. 


Validating English Language Arts and Mathematics Achievement Level Cut Points for EOCs 


Given the change in the blueprint of the English language arts and mathematics end-of-course 
assessments, the Department will convene a panel of educators in mid-February to review the results of 
the first statewide online administration of the updated English language arts and mathematics EOCs 
and the new physical science EOC. The panelists will review the results and the application of the 
historical achievement level cut points, and will make any necessary adjustments to the achievement 
level descriptors. In early June 2015, a small group of the panelist will reconvene to review the results of 
the spring administration of the updated and new EOCs, and make a recommendation to the State 
Board of Education regarding any adjustments needed to cut points due to the administration of the 
assessments online. 


Proposed Report Forms 


An individual student report (ISR) for parents will be delivered to public schools by the Department’s 
testing vendor. The public schools will be responsible for distributing the individual student report to 
parents. The ISR will describe the performance the student on the MAP. This report will provide the 
student’s overall proficiency in a subject area, and how a student performed on a content area claim for 
each assessment taken. Students, parents, teachers and counselors can use the information included on 
this report, along with information gathered through local assessment programs, to improve a student's 
academic performance and to guide decisions about a student's educational options (e.g., which classes 
a student should take, in which school programs a student should participate). A sample student report 
may be found in Appendix U. 


Classroom, grade, school, school district, and state results of the Missouri Assessment Program are 
reported through the Missouri Comprehensive Data System (MCDS). The MCDS is a resource provided 


by the Department that allows school personnel and the public to access education-related data. 


The data made available to the public masks or hides data for groups with 10 or fewer students to 
protect confidential information about individual students, as required by federal law. 


Three tools are available for data reviews are available within the MCDS portal: 
e Quick Facts for basic reports and documents. 
e Guided Inquiry for summary reports allowing simple filters. 


e Advanced Inquiry for in-depth research and analysis. 


The Missouri Comprehensive Data System may be found at http://mcds.dese.mo.gov. 


Page 16 of 39 


VII. 
Technical Considerations 


Validity of the MAP 


In the process of test development, the term “validity” refers to the extent to which an assessment 
instrument measures what it is designed to measure. In order to ensure the validity of the MAP, the 
Department of Elementary and Secondary Education and our assessment partners are constructing the 
new English language arts and mathematics grade-level, MAP-A, and End-of-course assessments 
according to the highest standards of the industry. All development is occurring with the guidance of 
expert Technical Advisory Committees and the Department’s technical and psychometric services 
vendor. 


Validity of an assessment is enhanced when items are grade-appropriate. Field testing and substantial 
educator involvement in the development of the MAP items ensures that they are appropriate for the 
intended grade levels in Missouri. Following each field test, item statistics are generated to evaluate 
each item. Items are accepted, edited or, if necessary, discarded depending upon their performance on 
the field test. 


Another factor which impacts the validity of an assessment instrument is item bias. Sound test 
development incorporates measures to eliminate any characteristics in an assessment that might 
unfairly influence student performance. A quality assessment must eliminate any influence by a 
student’s cultural background, ethnicity, gender, race, or socio-economic status. All MAP items are 
reviewed for potential bias. The Department’s assessment vendors produce supporting item statistics 
which indicate biases for or against particular subgroups in the student population. 


Ultimately, accurate interpretation of test scores determines the overall validity of the assessment 
program. How well educators, parents, and the general public understand what the tests say is the 
“bottom line.” Missouri educators, parents, and business professionals will be involved in defining and 
describing the levels of achievement that Missouri students are expected to attain. 


Reliability 


The reliability of an assessment refers to the consistency of measurement it provides. Two types of 
reliability are being considered in the development of the MAP. The first is reliability across forms of the 
assessment. In other words, the assessment is reliable if a student would perform similarly on each of 
the three equivalent forms of a MAP subject area assessment. A common test blueprint is used to 
ensure that the difficulty and length of each form of the assessment are similar. Statistical equating 
procedures will be used to create reliable equivalent forms. 


Because a portion of the MAP is performance based and must be hand-scored, inter-rater reliability is 
also being considered. Inter-rater reliability refers to the extent to which two different individuals would 
score a student’s response in a similar manner. To accomplish high inter-rater reliability, concise scoring 
guides are created for each item and scorer training materials which provide clear examples of student 
work at each score point are selected. Each individual scoring student responses will be required to 
complete an extensive training session and pass a “qualifying round” of scoring. A variety of techniques 
will be used to maintain accuracy throughout the scoring of student responses, as well. 


Page 17 of 39 


VIII. 
A Final Note 


In our fast-paced times, the general knowledge base and technology are changing and expanding at an 
amazing rate. Researchers are constantly identifying new and more effective educational 
methodologies. We must be responsive to these changes in order to provide the best possible 
opportunities for our children, the children of Generation Z. The Department believes that the 
adjustments to the 2014-2015 Missouri Assessment Program reflect best practices and industry 
standards. If Missouri hopes to provide the highest quality education for its students, then the state 
must continue to advocate for change that will promote educational progress. 


Page 18 of 39 


Appendix A— The Show-Me Standards 


Page 19 of 39 


Missouri students must build a 
solid foundation of factual knowledge 
and basic skills in the traditional 
content areas. The statements listed 
here represent such a foundation in 
reading, writing, mathematics, world 
and American history, forms of 
government, geography, science, 
health / physical education and the fine 
arts. This foundation of knowledge and 
skills should also be incorporated into 
courses in vocational education and 
practical arts. Students should acquire 
this knowledge base at various grade 
levels and through various courses of 
study. Each grade level and each 
course sequence should build on the 
knowledge base that students have 
previously acquired. 

These concepts and areas of study 
are indeed significant to success in 
school and in the workplace. However, 
they are neither inclusive nor are they 
likely to remain the same over the 
years. We live in an age in which 
“knowledge” grows at an ever- 
increasing rate, and our expectations 
for students must keep up with that 
expanding knowledge base. 

Combining what students must 
know and what they must be able to do 
may require teachers and districts to 
adapt their curriculum. To assist 
districts in this effort, teachers from 
across the state are developing 
curriculum frameworks in each of the 
content areas. These frameworks show 
how others might balance concepts and 
abilities for students at the elementary, 
middle and secondary levels. These 
models, however, are only resources. 
Missouri law assures local control of 
education. Each district has the 
authority to determine the content of 
its curriculum, how it will be organized 
and how it will be presented. 


The Show-Me 


2 


Stand 


we 


Cam 


Communication Arts 


In Communication Arts, students in Missouri public schools 
will acquire a solid foundation which includes knowledge of and 
proficiency in 


. speaking and writing standard English (including 
grammar, usage, punctuation, spelling, capitalization) 

. reading and evaluating fiction, poetry and drama 

. reading and evaluating nonfiction works and 
material (such as biographies, newspapers, technical 
manuals) 

. writing formally (such as reports, narratives, essays) 
and informally (such as outlines, notes) 

. comprehending and evaluating the content and 
artistic aspects of oral and visual presentations 
(such as story-telling, debates, lectures, multi-media 
productions) 

. participating in formal and informal presentations 
and discussions of issues and ideas 

. identifying and evaluating relationships between 
language and culture 


Social Studies 


In Social Studies, students in Missouri public schools will 
acquire a solid foundation which includes knowledge of 


. principles expressed in the documents shaping 
constitutional democracy in the United States 

. continuity and change in the history of Missouri, the 
United States and the world 

. principles and processes of governance systems 

. economic concepts (including productivity and the 
market system) and principles (including the laws of 
supply and demand) 

. the major elements of geographical study and 
analysis (such as location, place, movement, regions) 
and their relationships to changes in society and 
environment 

. relationships of the individual and groups to 
institutions and cultural traditions 

. the use of tools of social science inquiry (such as 
surveys, statistics, maps, documents) 


ol 
ras 


Mathematics 


In Mathematics, students in Missouri public schools will 
acquire a solid foundation which includes knowledge of 


1. 


addition, subtraction, multiplication and division; 
other number sense, including numeration and 
estimation; and the application of these operations 
and concepts in the workplace and other situations 


. geometric and spatial sense involving measurement 


(including length, area, volume), trigonometry, and 
similarity and transformations of shapes 


. data analysis, probability and statistics 
. patterns and relationships within and among 


functions and algebraic, geometric and trigonometric 
concepts 


. mathematical systems (including real numbers, 


whole numbers, integers, fractions), geometry, and 
number theory (including primes, factors, multiples) 


. discrete mathematics (such as graph theory, counting 


techniques, matrices) 


Fine Arts 


In Fine Arts, students in Missouri public schools will acquire 
a solid foundation which includes knowledge of 


. process and techniques for the production, exhibition 


or performance of one or more of the visual or 
performed arts 


. the principles and elements of different art forms 
. the vocabulary to explain perceptions about and 


evaluations of works in dance, music, theater and 
visual arts 


. interrelationships of visual and performing arts and the 


relationships of the arts to other disciplines 


. visual and performing arts in historical and cultural 


contexts 


KNOWLEDGE + PERFORMANCE = ACADEMIC SUCCESS 


Science 


In Science, students in Missouri public schools will acquire a 
solid foundation which includes knowledge of 


1. 
. properties and principles of force and motion 
. characteristics and interactions of living organisms 
. changes in ecosystems and interactions of organisms 


properties and principles of matter and energy 


with their environments 


. processes (such as plate movement, water cycle, air 


flow) and interactions of Earth’s biosphere, 
atmosphere, lithosphere and hydrosphere 


. composition and structure of the universe and the 


motions of the objects within it 


. processes of scientific inquiry (such as formulating 


and testing hypotheses) 


. impact of science, technology and human activity on 


resources and the environment 


Health/Physical Education 


In Health/Physical Education, students in Missouri public 
schools will acquire a solid foundation which includes 
knowledge of 


. structures of, functions of, and relationships among 


human body systems 


. principles and practices of physical and mental health 


(such as personal health habits, nutrition, stress 
management) 


. diseases and methods for prevention, treatment and 


control 


. principles of movement and physical fitness 
. methods used to assess health, reduce risk factors, 


and avoid high-risk behaviors (such as violence, 
tobacco, alcohol and other drug use) 


. consumer health issues (such as the effects of mass 


media and technologies on safety and health) 


. responses to emergency situations 


Missouri Department of Elementary and Secondary Education - DESE 3220-5 Rep 12/09 


TURN OVER 


Note to Readers: what should high school graduates in Missouri know 
and be able to do? The Missourians who developed these standards wrestled with 
that question. In the end, they agreed that “knowing” and “doing” are actually two 
sides of the same coin. To perform well in school or on the job, one must have a 
good foundation of basic knowledge and skills. Equally important, though, is the 
ability to use and apply one’s knowledge in real-life situations. 

These standards (793 in all) are intended to define what students should learn 
by the time they graduate from high school. On this side are 33 “performance” 
standards, listed under four broad goals. On the reverse side are 40 “knowledge” 
standards, listed in six subject areas. Taken together, they are intended to estab- 
lish higher expectations for students throughout the Show-Me State. These 
standards do not represent everything a student will or should learn. However, 
graduates who meet these standards should be well-prepared for further educa- 
tion, work and civic responsibilities. 


yh {\ 11 Missourians are eager to ensure that graduates of 
£. &Missouri’s public schools have the knowledge, skills and 
competencies essential to leading productive, fulfilling and 
successful lives as they continue their education, enter the 
workforce and assume their civic responsibilities. Schools need to 
establish high expectations that will challenge all students. To 
that end, the Outstanding Schools Act of 1993 called together 
master teachers, parents and policy-makers from around the state 
to create Missouri academic standards. These standards are the 
work of that group. 

The standards are built around the belief that the success of 
Missouri's students depends on both a solid foundation of 
knowledge and skills and the ability of students to apply their 
knowledge and skills to the kinds of problems and decisions they 
will likely encounter after they graduate. 

The academic standards incorporate and strongly promote 
the understanding that active, hands-on learning will benefit 
students of all ages. By integrating and applying basic knowledge 
and skills in practical and challenging ways across all disciplines, 
students experience learning that is more engaging and 
motivating. Such learning stays in the mind long after the tests 
are over and acts as a springboard to success beyond the classroom. 

These standards for students are not a curriculum. Rather, 
the standards serve as a blueprint from which local school 
districts may write challenging curriculum to help all students 
achieve. Missouri law assures local control of education. Each 
school district will determine how its curriculum will be 
structured and the best methods to implement that curriculum 
in the classroom. 


Authority for the Show-Me Standards: Section 160.514, Revised Statutes of Missouri, 
and the Code of State Regulations, 5 CSR 50-375.100. 


Students in Missouri public schools will acquire the knowledge and 
skills to gather, analyze and apply information and ideas. 


Students will demonstrate within and integrate across all content areas the 
ability to 


. develop questions and ideas to initiate and refine research 

. conduct research to answer questions and evaluate information and ideas 

. design and conduct field and laboratory investigations to study 
nature and society 

. use technological tools and other resources to locate, select and 
organize information 

. comprehend and evaluate written, visual and oral presentations and 
works 

. discover and evaluate patterns and relationships in information, 
ideas and structures 

. evaluate the accuracy of information and the reliability of its sources 

. organize data, information and ideas into useful forms (including 
charts, graphs, outlines) for analysis or presentation 

. identify, analyze and compare the institutions, traditions and art 
forms of past and present societies 

. apply acquired information, ideas and skills to different contexts as 
students, workers, citizens and consumers 


Students in Missouri public schools will acquire the knowledge and 
skills to communicate effectively within and beyond the classroom. 


Students will demonstrate within and integrate across all content areas the 
ability to 


. plan and make written, oral and visual presentations for a variety of 
purposes and audiences 

. review and revise communications to improve accuracy and clarity 

. exchange information, questions and ideas while recognizing the 
perspectives of others 

. present perceptions and ideas regarding works of the arts, humanities 
and sciences 

. perform or produce works in the fine and practical arts 

. apply communication techniques to the job search and to the workplace 

. use technological tools to exchange information and ideas 


Students in Missouri public schools will acquire the knowledge and 
skills to recognize and solve problems. 


Students will demonstrate within and integrate across all content areas the 
ability to 


. identify problems and define their scope and elements 

. develop and apply strategies based on ways others have prevented or 
solved problems 

. develop and apply strategies based on one’s own experience in 
preventing or solving problems 

. evaluate the processes used in recognizing and solving problems 

. reason inductively from a set of specific facts and deductively from 
general premises 

. examine problems and proposed solutions from multiple perspectives 

. evaluate the extent to which a strategy addresses the problem 

. assess costs, benefits and other consequences of proposed solutions 


Students in Missouri public schools will acquire the knowledge and 
skills to make decisions and act as responsible members of society. 


Students will demonstrate within and integrate across all content areas the 
ability to 


1. explain reasoning and identify information used to support decisions 
2. understand and apply the rights and responsibilities of citizenship in 
Missouri and the United States 
. analyze the duties and responsibilities of individuals in societies 
. recognize and practice honesty and integrity in academic work and in 
the workplace 
. develop, monitor and revise plans of action to meet deadlines and 
accomplish goals 
. identify tasks that require a coordinated effort and work with others to 
complete those tasks 
. identify and apply practices that preserve and enhance the safety and 
health of self and others 
. explore, prepare for and seek educational and job opportunities 


Missouri Department of Elementary and Secondary Education - DESE 3220-5 Rep 12/09 


TURN OVER 


Appendix B— Grade-Level Assessment Accommodations, Supports and Tools 


Page 20 of 39 


About The Updated Accommodations 


The accommodations for the Grade-Level assessments have changed starting with the Spring 2015 Grade- 
Level administration. 


What we previously knew as accommodations has now been split into three areas: Universal Tools, 
Designated Supports and Accommodations. 
e Universal Tools are available to all students taking a Grade-Level or End-of-Course assessment. 
e Designated Supports are available to students when deemed appropriate by a team of educators. 
e Accommodations must appear in an IEP/504 plan. 


On the chart that follows, each tool, support and accommodation has a designation referring to the type of 
assessment it can be used for. Those designations are as follows: 
e Online — If a tool, support or accommodation is designated online, it can only be used with the online 
assessment. 
e Online (Not Embedded) — If a tool, support, or accommodation is designated online (not embedded), it 
can only be used with the online assessment but requires software not embedded in the system. 
e Paper —Ifa tool, support, or accommodation is designated paper, it may only be used with the 
paper/pencil, Braille or large print assessments. 
e Any-—lIfatool, support, or accommodation is designated any, it may be used with the online, 
paper/pencil, Braille or large print assessments. 


For Special Education students, the IEP team should choose all of the designated supports and 
accommodations that a student will receive. 


Some designated supports and accommodations are only for ELL students. ELL students include those 
receiving services (RCV) or not receiving services (NRC). ELL students do not include those students in 
monitored status (MY1 or MY2). 


Universal Tools 


e The following is a list of universal tools for the Grade-Level and End-of-Course assessments. 
e These tools are available to all students. 


Tool | Format | Description 


The system allows all students to pause the assessment for up to 20 minutes. There is 
no limit on the amount of times a student may use this tool. If the test is paused for 
Break Online ; 
more than 20 minutes the student will be prevented from returning to items already 
(Pause) 
attempted. 


All students may take breaks of up 20 minutes as needed. 
The system allows all students, on items where calculator use is allowed, to have 
Calculator Online oe 
access to an embedded digital calculator. 
(For Calculator 
All students may have access, on items where calculator use is allowed, to a physical 
Allowed Items Only) Any 
calculator. 
Online The system allows all students access to an embedded English dictionary for use on 
; ue the writing performance task. 
English Dictionary : an ae 
re All students may have access to a physical English dictionary for use on the writing 
y performance task. 
The system allows all students to expand a passage or item so that it takes up a larger 
Expandable Passages | Online ad P P P 5 
portion of the screen. 


The system allows all students to access an embedded glossary, which shows grade- 


Glossary 
(Grades 3-8 Math Online | and context-appropriate definitions of specific construct-irrelevant terms. 
and ELA only) e This tool is not available for Grades 5 and 8 Science assessments. 


Online The system allows all students to have access to a highlighter for marking desired text, 
Highlighter questions and answers. 


Online 

The system allows all students to mark an item for review. The flag is not saved if a 
Mark For Review Online 

student moves onto another segment or pauses the test for more than 20 minutes. 

The system allows all students to use a digital notepad to make notes about an item. 

Notes are not saved when a student moves onto the next segment or pauses the test 

for longer than 20 minutes. During the writing performance task, notes are retained 

for all portions of the task. 

All students may have access to physical scratch paper to make notes about an item. 


Physical scratch paper should be collected and destroyed immediately upon the 
conclusion of the testing session. 


Bale The system allows all students to use an embedded protractor on specific items 
where appropriate. 
Protractor ee 
Banas All students may have access to a physical protractor for use on specific items where 
P appropriate. 
online The system allows all students to use an embedded ruler on specific items where 
appropriate. 


Online 
Notepad 


(Scratch paper) 


Paper 


er All students may have access to a physical ruler for use on specific items where 
“ appropriate. 


The system allows all students to use an embedded spell check feature on specific 


items where appropriate. The spell check feature only indicates that a word is 
misspelled; it does not provide the correct spelling. 


Spell Check Online 


Strikethrough Online | The system allows all students to cross out answer options 


All students may have access to a physical thesaurus during the writing performance 


Universal Tools 
e The following is a list of universal tools for the Grade-Level and End-of-Course assessments. 
e These tools are available to all students. 


‘Tool | Format |Description 


The system allows all students to use selected writing tools on specific items where 
Writing Tools Online | appropriate. The tools include the ability to bold text, italicize text, create bullets 

points and an undo/redo feature. 

they appear larger or smaller than the default size. 

formulas, tables, graphics, etc. 


Designated Supports 
The following is a list of designated supports for the Grade-Level and End-of-Course assessments. 


These supports are available to students when deemed appropriate by a team of educators. 


ELL students include those receiving services (RCV) or not receiving services (NRC). ELL students do not include those 
students in monitored status (MY1 or MY2). 


Support —————s|s Format |Description = C—“‘“‘(NCC#@d*CCde 
Bilingual Dictionary nay ELL students may have access to a physical bilingual dictionary for use on S431 
the writing performance task. 
Th lI l k f | 
Spline e system allows students to adjust background or font color based on S101 
student needs or preferences. 
Color Contrast are 
Students may have the test presented to them printed in different colors 
Paper $102 
based on student needs or preferences. 
Students may have a color transparency placed over the test presented to 
Color Overlay Paper : ee P Ve ? P S103 
them based on student needs or preferences. 
Glossary All students taking the paper based, Braille or Large Print assessment may 
(Grades 3-8 Math Paper have access to a specific glossary, to be included with the assessment. 
and ELA only) e This support is not available for Grades 5 and 8 Science assessments. 
Online - The system allows students to use assistive technology devices to change 


Magnification Not the size of text, formulas, tables, graphics, etc. beyond the capabilities of 
Embedded | the zoom tool. 


Online ; S106 
need or that may be distracting by using an embedded masking tool. 
[rover | inmedsteneederthstmoybedstading 
Paper . $107 
immediate need or that may be distracting. 
The system allows items in mathematics and English language arts to be 
Online read aloud to the student via embedded text-to-speech technology. The 
student can control the speed and volume of the voice. 
Online - Students may use assistive technology text-to-speech software to allow 
Read-Aloud Not all items in any subject, not including ELA reading passages, to be read 
(For all items in any Embedded | aloud. 
subject, excluding Students may have items in mathematics, science, and English language 
ELA reading arts to be read aloud to them by a trained reader. Read Aloud of ELA 
passages) reading passages requires an IEP or 504 plan. 
ELL students may have items in mathematics, science, and English 
language arts to be read aloud to them in their native language by a 
trained translator. Read Aloud of ELA reading passages requires an IEP or 
504 plan. 
Scribe Students may dictate their responses to a trained scribe, who must follow 
(For all items in any the administration guidelines. Scribing of ELA writing requires an IEP or 


subject, excluding 504 plan. 
ELA writing) 


$104 


$105 


S041 


S042 


S043 


$111 


$351 


Students may be allowed to test in a separate setting from other 
Separate Setting students. This includes testing individually or testing as part of a smaller 


S501 


Designated Supports 
The following is a list of designated supports for the Grade-Level and End-of-Course assessments. 


These supports are available to students when deemed appropriate by a team of educators. 
Designated supports must be turned on prior to testing. 


ELL students include those receiving services (RCV) or not receiving services (NRC). ELL students do not include those 
students in monitored status (MY1 or 2) 


The system allows ELL students to have the test directions for math 
translated through an embedded feature. 

The system allows ELL students to access translated glossaries for 
selected construct-irrelevant math items. 

The system allows ELL students to use stacked translations on 
selected construct-irrelevant math items. 

ELL Students may have test directions for math, science and social 
studies translated. 

ELL students may respond to any assessment in their native language. 
The responses must be translated and then transcribed by a trained 
scribe, who must follow the administration guidelines. 

ELL students taking the paper based, Braille or Large Print assessment 
may have access to a specific glossary, to be included with the 
assessment. This glossary can be translated locally. 


Turn Off Universal The system allows test administrators to turn off universal tools that 
Online S100 
Tools might be distracting to a student or that students are unable to use. 


Translation 


Accommodations For Students With Disabilities 

The following is a list of accommodations for the Grade-Level and End-of-Course assessments. 

The accommodation must appear in an IEP/504 plan to be allowed. 

ELL students include those receiving services (RCV) or not receiving services (NRC). ELL students do not include those 
students in monitored status (MY1 or MY2). 


[Accommodation _| Format |Description | Code 
}Abacus  =—S||_—~——s Any'_| Students may have access to an abacus. A391 


Alternate Response 
Options 


American Sign 
Language (ASL) 


(For math and 
science items and 
ELA listening items) 


Braille 


*INVALIDATION* 
Calculator 

GRADE 3 ONLY 

(For Non-Calculator 
Allowed Items Only) 
*INVALIDATION* 
Calculator 

GRADES 4-8 ONLY 
(For Non-Calculator 
Allowed Items Only) 


*INVALIDATION* 
Multiplication Table 
GRADE 3 ONLY 
*INVALIDATION* 


Multiplication Table 
GRADES 4-8 


Paper Based 


Assessment 


Students may respond to items using an alternate option, including but 
not limited to: Adapted Keyboards, StickyKeys, MouseKeys, FilterKeys, 
Adapted Mouse, Touch Screen, Head Wand, Switches. 


Students may have math, science, social studies items and ELA listening 
items translated into ASL. 


Students with visual impairments may read text via Braille. Refreshable 
Braille is available only for ELA. For math, Braille will be presented via 
embosser. ELA may be presented via embosser. 

Students with visual impairments may access the assessment via a Braille 
version. Tactile overlays and graphics tools may be used to assist the 
student in accessing the content. 

All students in Grade 3 may have access, on items where calculator use is 
not allowed, to a physical calculator. 

NOTE: Use of this will result in invalidation — Student will receive lowest 
obtainable scale score (LOSS). 


All students in Grades 4-8 may have access, on items where calculator use 
is not allowed, to a physical calculator. 


Students in Grade 3 may have access to a single digit multiplication table. 
NOTE: Use of this will result in invalidation — Student will receive lowest 
obtainable scale score (LOSS) 


Students may have access to a paper based version of the assessment. 
This can be accessed either by the complete assessment or by printing 


passages/stimuli/items on demand for the student as determined by the 
IEP/504. 


Accommodations For Students With Disabilities 
The following is a list of accommodations for the Grade-Level and End-of-Course assessments. 


The accommodation must appear in an IEP/504 plan to be allowed. 
ELL students include those receiving services (RCV) or not receiving services (NRC). ELL students do not include those 
students in monitored status (MY1 or MY2). 


|Accommodation _| Format |Description Code 


Students in grades 3-5 may have English language arts reading passages 
read aloud to them by a trained reader. NOTE: Use of this will result in A041 


invalidation — Student will receive lowest obtainable scale score (LOSS). 


Students in grades 3-5 may use assistive technology text-to-speech 
Read-Aloud Online - ; 
software to allow ELA reading passages to be read aloud. NOTE: Use of AOA? 


*INVALIDATION* 


RADES 3-5 ONLY N 
G iad a this will result in invalidation — Student will receive lowest obtainable 


scale score (LOSS). 


*INVALIDATION# ELL students in grades 3-5 may have English language arts reading 
passages read aloud to them in their native language by a trained mas 


(ELA reading Embedded 
passages) 


translator. NOTE: Use of this will result in invalidation — Student will 
receive lowest obtainable scale score (LOSS). 


The system allows English language arts reading passages to be read 
aloud to the student via embedded text-to-speech technology. The A043 
student can control the speed and volume of the voice. 
pead vous Online - Students may use assistive technology text-to-speech software to allow oo 
A044 


BRACES Oe ant Eng Not ELA reading passages to be read aloud. 
of-Course ONLY 
Embedded 


ELA readin 
( 6 Students may have English language arts reading passages to be read 
passages) Any A045 
aloud to them by a trained reader. 
Any ELL students may have English language arts reading passages to be read A112 


aloud to them in their native language by a trained translator. 


Read-Aloud Blind students in any grade who do not yet have adequate Braille skills 
(ELA reading may have ELA reading passages read aloud. A046 


passages) 
Scribe Students may dictate their responses to a trained scribe, who must follow 

48 Any - . Se ie A351 
(For ELA writing) the administration guidelines. 
Specialized Students may have access, on items where calculator use is allowed, to a 
Calculator specialized calculator, including talking calculators or Braille calculators, A396 
(For Calculator when appropriate. 
Allowed Items Only) 


Online - The system allows students to use voice recognition software so the 
Speech-To-Text Not student may use their voice to dictate responses or give commands. A352 


Embedded 


Appendix C— ___End-of-Course Assessment Accommodations, Supports and Tools 


Page 21 of 39 


About The Updated Accommodations 


The accommodations for the End-of-Course assessments have changed starting with the Fall 2014 EOC 
administration. 


What we previously knew as accommodations has now been split into three areas: Universal Tools, 
Designated Supports and Accommodations. 
e Universal Tools are available to all students taking a Grade-Level or End-of-Course assessment. 
e Designated Supports are available to students when deemed appropriate by a team of educators. 
e Accommodations must appear in an IEP/504 plan. 


On the chart that follows, each tool, support and accommodation has a designation referring to the type of 
assessment it can be used for. Those designations are as follows: 
e Online —If a tool, support or accommodation is designated online, it can only be used with the online 
assessment. 
e Online (Not Embedded) — If a tool, support, or accommodation is designated online (not embedded), it 
can only be used with the online assessment but requires software not embedded in the system. 
e Paper —Ifa tool, support, or accommodation is designated paper, it may only be used with the 
paper/pencil, Braille or large print assessments. 
e Any-—lfa tool, support, or accommodation is designated any, it may be used with the online, 
paper/pencil, Braille or large print assessments. 


For Special Education students, the IEP team should choose all of the designated supports and 
accommodations that a student will receive. 


Some designated supports and accommodations are only for ELL students. ELL students include those 
receiving services (RCV) or not receiving services (NRC). ELL students do not include those students in 
monitored status (MY1 or MY2). 


Universal Tools 


e The following is a list of universal tools for the Grade-Level and End-of-Course assessments. 
e These tools are available to all students. 


Tool | Format | Description, 
Break (Pause) All students may take breaks of up 20 minutes as needed. 


The system allows all students, on items where calculator use is allowed, to have 
Calculator Online ae 
access to an embedded digital calculator. 
(For Calculator 
All students may have access, on items where calculator use is allowed, to a physical 
Allowed Items Only) Any 
calculator. 
; bas All students may have access to a physical English dictionary for use on the writin 
English Dictionary Any y ny 6 y 6 
performance task. 
Online The system allows all students to have access to a highlighter for marking desired text, 
Highlighter questions and answers. 


All students may have access to a physical highlighter. 


The system allows all students to mark an item for review. For End-of-Course 

Mark For Review Online | assessments, flags are saved until the user indicates they are finished with the 
assessment. 

Notepad All students may have access to physical scratch paper to make notes about an item. 

(Scratch paper) Paper | Physical scratch paper should be collected and destroyed immediately upon the 

conclusion of the testing session. 


All students may have access to a physical protractor for use on specific items where 
Protractor Paper 
appropriate. 
nee All students may have access to a physical ruler for use on specific items where 
P appropriate. 


Strikethrough The system allows all students to cross out answer options. 


All students may have access to a physical thesaurus during the writing performance 
Thesaurus Any y een 6 BP 
task. 
Boner All students may have access to devices that allow them to change the size of text, 
P formulas, tables, graphics, etc. 


Designated Supports 
The following is a list of designated supports for the Grade-Level and End-of-Course assessments. 


These supports are available to students when deemed appropriate by a team of educators. 


ELL students include those receiving services (RCV) or not receiving services (NRC). ELL students do not include those 
students in monitored status (MY1 or MY2). 


RN EN Ce 


ELL students may have access to a physical bilingual dictionary for use on 
Bilingual Dictionary y pny 5 y $431 
the writing performance task. 
Students may have the test presented to them printed in different colors 
Color Contrast Paper y P P S102 
based on student needs or preferences. 
Students may have a color transparency placed over the test presented to 
Color Overlay Paper y P ii P S103 
them based on student needs or preferences. 
Online - The system allows students to use assistive technology devices to change 


Magnification Not the size of text, formulas, tables, graphics, etc. beyond the capabilities of S105 
Embedded | a standard zoom tool. 


Students may use a masking tool to block off content that is not of 
Paper ; ; ; $107 
immediate need or that may be distracting. 


Online - Students may use assistive technology text-to-speech software to allow 
Not all items in any subject, not including ELA reading passages, to be read S042 


Embedded 


Read-Aloud . 
He all ae ai Students may have items in mathematics, science, social studies and 

, : : English language arts to be read aloud to them by a trained reader. Read S043 
subject, excluding 


: Aloud of ELA reading passages requires an IEP or 504 plan. 
ELA reading . . : 
ELL students may have items in mathematics, science, social studies and 
passages) 
English language arts to be read aloud to them in their native language by S111 
a trained translator. Read Aloud of ELA reading passages requires an IEP 
or 504 plan. 


Scribe Students may dictate their responses to a trained scribe, who must follow 
(For all items in any the administration guidelines. Scribing of ELA writing requires an IEP or 5354 
subject, excluding 504 plan. 

ELA writing) 


Students may be allowed to test in a separate setting from other 
Separate Setting students. This includes testing individually or testing as part of a smaller $501 


ELL Students may have test directions for math, science and social 

studies translated. 

ELL students may respond to any assessment in their native language. 

The responses must be translated and then transcribed by a trained 5109 


Translation An sacks ; ne 
Y scribe, who must follow the administration guidelines. 


ELL students taking the paper based, Braille or Large Print assessment 
may have access to a specific glossary, to be included with the 


assessment. This glossary can be translated locally. 


Accommodations For Students With Disabilities 

The following is a list of accommodations for the Grade-Level and End-of-Course assessments. 

The accommodation must appear in an IEP/504 plan to be allowed. 

ELL students include those receiving services (RCV) or not receiving services (NRC). ELL students do not include those 
students in monitored status (MY1 or MY2). 


|Accommodation _| Format |Description =| Code 
/Abacus  =—S|_~——s Any_| Students may have access to an abacus. A391 


Students may respond to items using an alternate option, including but 
not limited to: Adapted Keyboards, StickyKeys, MouseKeys, FilterKeys, 
Adapted Mouse, Touch Screen, Head Wand, Switches. 

American Sign Students may have math, science, social studies items and ELA listening 
Language (ASL) items translated into ASL. 

(For math, science, 
social studies items 
and ELA listening 
items) 


te Students with visual impairments may access the assessment via a Braille 
Braille version. Tactile overlays and graphics tools may be used to assist the 
student in accessing the content. 
Print version. 
digit multiplication table. 
Students may have access to a paper based version of the assessment. 
Assessment 
Online - Students may use assistive technology text-to-speech software to allow 
Not ELA reading passages to be read aloud. 
Read-Aloud Embedded 
passages) aloud to them by a trained reader. 
aloud to them in their native language by a trained translator. 
Read-Aloud Blind students in any grade who do not yet have adequate Braille skills 


(ELA reading may have ELA reading passages read aloud. 
passages) 


Scribe Students may dictate their responses to a trained scribe, who must follow 
ad Any ne a oars A351 
(For ELA writing) the administration guidelines. 


Specialized Students may have access, on items where calculator use is allowed, to a 
Calculator specialized calculator, including talking calculators or Braille calculators, 
(For Calculator when appropriate. 

Allowed Items Only) 


Alternate Response 
Options 


The system allows students to use voice recognition software so the 
Speech-To-Text student may use their voice to dictate responses or give commands. 
Embedded 


Appendix D— Sample Assessment Items 


Page 22 of 39 


‘For 1a-1b, select the symbol (<, >, or =) that should be placed | 


in the box || to make each statement true. 


O< O> O= 


O< O> O= 


A carpenter used exactly 25 feet of wood to make 9 shelves of 
equal length. Each shelf measured between — 


(@) 1 and 2 feet. 
2 and 3 feet. 
© 3and 4 feet. 
©) 


4 and 5 feet. 


A survey was administered to 500 high school students to 
determine the type of music they prefer. The survey indicated 
that 22% prefer rock, 26% prefer hip hop, 29% prefer pop, and 
23% selected "other." Which representation best illustrates the 
number of students preferring each type of music? 


A) Preferred Music C) Preferred Music 


Type Percent of 
YPe Students 


ree | 


B) Preferred Music D) Y Preferred Music 
| 150 
120 
90 
BO 
Rock Hip Pop Other 30 


Hop 


0 Rock Hip Pop Other 
Hop 


What is the value of the numerical expression below? 


{16 += -2 
A) 4 
B) 6 
C) 8 


D) 10 


‘Read the text and complete the task that follows it. 


What Are Coral Reefs? 


The mention of coral reefs generally brings to mind warm 
climates, colorful fishes, and clear waters. However, the reef 
itself is actually a component of a larger ecosystem. The coral 
community is really a system that includes a collection of 
biological communities, representing one of the most diverse 
ecosystems in the world. For this reason, coral reefs often are 
referred to as the "rainforests of the oceans." 


Corals themselves are tiny animals which belong to the group 
Cnidaria (the "c" is silent). Other cnidarians include hydras, 
jellyfish, and sea anemones. Corals are sessile animals, meaning 
they are not mobile but stay fixed in one place. They feed by 
reaching out with tentacles to catch prey such as small fish and 
planktonic animals. Corals live in colonies consisting of many 
individuals, each of which is called polyp. They secrete a hard 
calcium carbonate skeleton, which serves as a uniform base or 
substrate for the colony. The skeleton also provides protection, 
as the polyps can contract into the structure if predators 
approach. It is these hard skeletal structures that build up coral 
reefs over time. The calcium carbonate is secreted at the base 
of the polyps, so the living coral colony occurs at the surface of 
the skeletal structure, completely covering it. Calcium carbonate 
is continuously deposited by the living colony, adding to the size 
of the structure. Growth of these structures varies greatly, 
depending on the species of coral and environmental conditions 
—ranging from 0.3 to 10 centimeters per year. Different species 
of coral build structures of various sizes and shapes ("brain 
corals, "fan corals," etc.), creating amazing diversity and 
complexity in the coral reef ecosystem. Various coral species 
tend to be segregated into characteristic zones on a reef, 
separated out by competition with other species and by 
environmental conditions. 


Virtually all reef-dwelling corals have a symbiotic (mutually 
beneficial) relationship with algae called zooxanthellae. The 
plant-like algae live inside the coral polyps and perform 
photosynthesis, producing food which is shared with the coral. 
In exchange the coral provides the algae with protection and 
access to light, which is necessary for photosynthesis. The 
zooxanthellae also lend their color to their coral symbionts. 
Coral bleaching occurs when corals lose their zooxanthellae, 
exposing the white calcium carbonate skeletons of the coral 
colony. There are a number of stresses or environmental 
changes that may cause bleaching including disease, excess 
Shade, increased levels of ultraviolet radiation, sedimentation, 
pollution, salinity changes, and increased temperatures. 


Because the zooxanthellae depend on light for photosynthesis, 
reef-building corals are found in shallow, clear water where light 
can penetrate down to the coral polyps. Reef building coral 
communities also require tropical or sub-tropical temperatures, 
and exist globally in a band 30 degrees north to 30 degrees 
south of the equator. Reefs are generally classified in three 
types. Fringing reefs, the most common type, project seaward 
directly from the shores of islands or continents. Barrier reefs 
are platforms separated from the adjacent land by a bay or 
lagoon. Atolls rest on the tops of submerged volcanoes. They 
are usually circular or oval with a central lagoon. Parts of the 
atoll may emerge as islands. 


Coral reefs provide habitats for a large variety of organisms. 
These organisms rely on corals as a source of food and shelter. 
Besides the corals themselves and their symbiotic algae, other 
creatures that call coral reefs home include various sponges; 
mollusks such as sea slugs, nudibranchs, oysters, and clams; 
crustaceans like crabs and shrimp; many kinds of sea worms; 
echinoderms like star fish and sea urchins; other cnidarians such 
as jellyfish and sea anemones; various types of fungi; sea 
turtles; and many species of fish. 


Item Prompt: 
Summarize the relationship between coral reefs and algae using 


details from the text. 


The Southland 
excerpt from White Fang 
by Jack London 


White Fang, written by Jack London, tells the story of a wild 
wolf dog’s journey to domestication. When he is three years 
old, White Fang is found by Grey Beaver, a Native American 
living in Yukon Territory, Canada. White Fang pulls sleds to help 
Grey Beaver hunt and fish. Grey Beaver then sells White Fang 
to a new owner, who mistreats the wolf dog. Later, Weedon 
Scott becomes White Fang’s owner and begins to further civilize 
the wolf dog by treating him with kindness. 


White Fang landed from the steamer in San Francisco. He was 
appalled. Deep in him, below any reasoning process or act of 
consciousness, he had associated power with godhead. And 
never had the men seemed such marvelous gods as now, when 
he trod the slimy pavement of San Francisco. The log cabins he 
had known were replaced by towering buildings. The streets 
were crowded with perils—wagons, carts, automobiles; great, 
straining horses pulling huge trucks; and monstrous cable and 
electric cars hooting and clanging through the midst, screeching 
their insistent menace after the manner of the lynxes he had 
known in the northern woods. 


All this was the manifestation of power. Through it all, behind it 
all, waS man, governing and controlling, expressing himself, as 
of old, by his mastery over matter. It was colossal, stunning. 
White Fang was awed. Fear sat upon him. As in his cubhood he 
had been made to feel his smallness and puniness on the day 
he first came in from the Wild to the village of Grey Beaver, so 
now, in his full-grown stature and pride of strength, he was 
made to feel small and puny. And there were so many gods! He 
was made dizzy by the swarming of them. The thunder of the 
streets smote upon his ears. He was bewildered by the 
tremendous and endless rush and movement of things. As 
never before, he felt his dependence on the master, close at 
whose heels he followed, no matter what happened never losing 
sight of him. 


But White Fang was to have no more than a nightmare vision of 
the city—an experience that was like a bad dream, unreal and 
terrible, that haunted him for long after in his dreams. He was 
put into a baggage-car by the master, chained in a corner in 
the midst of heaped trunks and valises. Here a squat and 
brawny god held sway, with much noise, hurling trunks and 
boxes about, dragging them in through the door and tossing 
them into the piles, or flinging them out of the door, smashing 
and crashing, to other gods who awaited them. 


And here, in this inferno of luggage, was White Fang deserted 
by the master. Or at least White Fang thought he was 
deserted, until he smelled out the master’s canvas clothes-bags 
alongside of him, and proceeded to guard them. 


“‘Bout time you come,” growled the god of the car, an hour 
later, when Weedon Scott appeared at the door. “That dog of 
yourn won't let me lay a finger on your stuff.” 


White Fang emerged from the car. He was astonished. The 
nightmare city was gone. The car had been to him no more 
than a room in a house, and when he had entered it the city 
had been all around him. In the interval the city had 
disappeared. The roar of it no longer dinned upon his ears. 
Before him was smiling country, streaming with sunshine, lazy 
with quietude. But he had little time to marvel at the 
transformation. He accepted it as he accepted all the 
unaccountable doings and manifestations of the gods. It was 
their way. 


There was a Carriage waiting. A man and a woman approached 
the master. The woman’s arms went out and clutched the 
master around the neck—a hostile act! The next moment 
Weedon Scott had torn loose from the embrace and closed with 
White Fang, who had become a snarling, raging demon. 

“It’s all right, mother,” Scott was saying as he kept tight hold 
of White Fang and placated him. “He thought you were going to 
injure me, and he wouldn't stand for it. It’s all right. It’s all 
right. He'll learn soon enough.” 


And in the meantime I may be permitted to love my son when 
his dog is not around,” she laughed, though she was pale and 
weak from the fright. 


She looked at White Fang, who snarled and bristled and glared 
malevolently. 


“He'll have to learn, and he shall, without postponement,” Scott 
said. 


He spoke softly to White Fang until he had quieted him, then his 
voice became firm. 


“Down, sir! Down with you!” 
This had been one of the things taught him by the master, and 
White Fang obeyed, though he lay down reluctantly and 


Sullenly. 


“Now, mother.” 
Scott opened his arms to her, but kept his eyes on White Fang. 


“Down!” he warned. “Down!” 


In paragraph 8, what does the word placated mean? 


A. turned 
B. scolded 
C. soothed 


D. distracted 


Will Fish Farming Save Our Oceans? 


1 Only in the last few decades have people become aware 
that the ocean’s teeming bounty is not, in fact, boundless. 
Until recently, almost all of the seafood eaten worldwide was 
harvested directly from the wild. People depended on the 
natural abundance and resilience of the oceans, rivers, and 
lakes. But as the human population has boomed, the need for 
fish as a food resource has also grown. For billions of people, 
fish are a primary source of protein. InN some nations, such as 
the United States, where fish has traditionally made up only a 
small portion of the average diet, fish is seen as a healthier 
alternative to beef and pork. Around the world, the demand for 
seafood is on the rise. 


2 However, we can no longer rely on wild-caught seafood, as 
we have in the past. Overfishing, pollution, and loss of habitat 
have strained wild fish populations. There is now an urgent 
need for alternatives. One of these is aquaculture, or fish 
farming. But this solution is not without controversy. 


What Is Aquaculture? 


3 Aquaculture means "farming or cultivating the water." The 
idea of farming fish is certainly not new. Like agriculture, it has 
been practiced since ancient times. But it was not until the 
1960s and 1970s that aquaculture became a significant part of 
global production. It now accounts for more than 40 percent of 
the world’s seafood. 


4 There are two basic types of aquaculture. The first is 
extensive aquaculture. Extensive aquaculturists set up their 
farms in oceans or bays, and natural currents keep the farm’s 
water clean and full of oxygen. Oysters, mussels, and clams are 
raised this way, but so are some large finfish, such as salmon 
and tuna. How do the farmers prevent their mobile crops from 
escaping into the ocean? The fish are kept in cages or “net 
pens” that are anchored to the ocean floor and can be densely 
stocked for higher production. 


5 The other type of aquaculture is intensive. Freshwater fish 
such as catfish, tilapia, and carp are some of the species grown 
by intensive methods. This form of aquaculture relies on man- 
made ponds and advanced technology. One intensive fish farm in 
California grows 5 million pounds of tilapia per year in the middle 
of the desert! Enormous greenhouses with solar-heated tanks 
mimic the tilapia’s natural environment. An advanced computer 
system removes waste, maintains temperature and oxygen 
levels, and feeds the fish on a regular schedule. An average- 
sized tilapia farm may have more than 200,000 fish in the tanks 
at any time. 


A Solution... 


6 Aquaculture seems to offer many advantages over 
traditional fishing. For one thing, fish farms might be able to 
reduce the pressure on wild fish populations. Also, some types 
of seafood are usually available in certain seasons only. Thanks 
to farms, these delicacies are available year-round. With careful 
breeding, farmers have produced 
“domesticated” fish that are fast growing and made-to-order. 
Now restaurants can plan menus knowing that fish of a certain 
kind and size will always be delivered. Reliable production has 
reduced the prices of many kinds of fish, making them more 
accessible as everyday food. 


7 All this spells good news for the consumer. Aquaculture 
also seems to be good for developing nations. For example, on 
Zanzibar, an island off the eastern coast of Africa in the Indian 
Ocean, seaweed raised by aquaculture has become the leading 
export. Researchers are now developing techniques to add 
finfish and shellfish to this production. Local fish farms can 
provide more job opportunities and make cheaper seafood 
available to islanders and for export. 


.. or Part of the Problem? 


8 But aquaculture’s supposed advantages may be too good 
to be true. In fact, fish farms may not be any healthier for the 
environment. The fish produced in farms must be fed. Their 
food is made from smaller species of “trash” fish, such as 
herring and anchovies, which are harvested directly from the 
ocean, further taxing wild fisheries. It takes two pounds of fish 
food to produce one pound of farmed fish-not a very 
economical ratio, to say the least! 


9 Also, hundreds of thousands of fish are crammed together 
in these floating feedlots, as opponents call them. Fish farms 
create a lot of waste in the form of uneaten food, feces, dead 
fish, and chemicals. In extensive fish farming, this waste is 
flushed by the current into the surrounding ocean and bay, 
where it may affect the ecosystem in unknown ways. Intensive 
fish farmers often dump the waste from their artificial ponds 
and tanks into nearby waterways. 


10 Fish farms not only affect the environment; they may also 
harm communities. The prospects for fish farming in the 
developing world seem promising. But the example of shrimp 
aquaculture in Southeast Asia casts doubts on its benefits for 
local residents. In Thailand and Vietnam, aquaculture has 
impaired rice farming, a traditional and far more efficient means 
of food production. Shrimp farms use up valuable fresh water 
and land resources vital to rice farmers, and the waste released 
into the environment has polluted water and farmland. Also, 
contrary to the hopeful claims of aquaculture advocates, the 
shrimp produced by these farms are not used to feed local 
populations cheaply. Instead, they are sold at high prices to the 
United States and other industrialized nations as luxury items. 


Future Outlook 


11 Although there are compelling reasons to pursue 
aquaculture, it has created a whole new set of problems. With 
careful regulations and management, fish farms may eventually 
become the ideal solution to depleted fisheries, but there is 
much work to be done before this alternate source of seafood is 
truly sustainable. 


Based on what you have read in the passage, which of these 
questions requires further evidence for support? 


A) What are the current methods of aquaculture? 
B) What are the reasons for the reduction of fish in the wild? 
C) What are some types of fish harvested through aquaculture? 


D) What are some ways to limit the negative effects of fish 
farming? 


Read this sentence from the passage. 


"Fish farms not only affect the environment; they may also 
harm communities." 


Which question would best clarify the idea in the sentence? 
A) How many fish can one fish farm produce in a single year? 
B) What is the largest species of fish produced in the fish farms? 


C) What are the long-term effects of waste products from fish 
farms on humans? 


D) How do intensive aquaculture farms keep water conditions 
Similar to oceans and lakes? 


In a certain insect, round wings (R) are dominant to pointed 
wings (r). Which cross will produce the greatest number of 
genotypic and phenotypic variations? 

A) rr x rr 

B) Rr xX Rr 

C) Rr x RR 


D) RR x RR 


Polar bears swim across large expanses of ocean while hunting 
for seals, their main source of food. The bears use sea ice as 
resting spots during their long swims. However, the sea ice is 
rapidly melting as a result of global warming. Which statement 
describes what most likely will happen if global warming 
continues at its present rate? 


A) Polar bear and seal populations will both increase. 


B) Polar bear populations will decrease, and seal populations will 
increase. 


C) Polar bear populations will increase, and seal populations will 
decrease. 


D) Polar bear populations will decrease, and seal populations 
will remain the same. 


Appendix E-—  Grade-Level Research Agenda 


Page 23 of 39 


Smarter 


Assessment Consortium 


Smarter Balanced 


Assessment Consortium: 
Comprehensive Research Agenda 


Report of Recommendations Prepared by Stephen G. Sireci 


December 31, 2012 


Acknowledgments 


Smarter Balanced is a true collaboration among some of the most talented and dedicated 
educational researchers in the United States. This research agenda could not have been 
produced without the help of many of them, particularly Carole Gallagher, Marty McCall, 
Christyan Mitchell, Joe Willhoft, Joseph Martineau, Vince Dean, Carissa Miller, Steve Slater, 
Randy Bennett, Jacqueline King, Mohamed Dirir, Liru Zhang, Garron Gianopulos, Patricia Reiss, 
and April Zenisky. | am also grateful for the valuable input from the Smarter Balanced Technical 
Advisory Committee and the Validation and Psychometrics/Test Design Work Group. Many of 
the ideas for validity studies came from conversations with these colleagues. 


Table of Contents 


Wal CFOGUCTION tetccust teseuslb coast dreteuoesleeteuch Saseuusl ute aecauswuusaabe Bests Senuaue Wuueunedsusiuetaseutastebalahuau asus tcuseusien 4 
PUFDOSES Of TMS RE DOME ivewecsecsdicrervedserucasnesvaaubauscducedeodauuelardwusnnedacutunsdenbeusteusedsaducaneaaverdeauvdbeeduades 4 
Il. Standards and Guidelines for TeSt Validation ..........cccccssssseccssssseccssuseccuseseecssusensusesencasessaceseusencases 6 
The Standards for Educational and Psychological Testing: A Validation Framework ..............5 6 
NCLB: Peer REVIEW GUICEIIN GS: sevccevecsieiecnrvecvavursureseesccosdeseedsaueussnadenvcteduuunberstausvetedscateadveddeauvadudsdsodes 8 
OMier ValiGaulon GUICEIINES Sa iets tea astenieeecibadtinns tin siasatiausatatuendouastudtatas caaieuaveaseasa tenn sanceastonaseeasaustee 9 
Ill. Smarter Balanced Purpose Statements for ValidatiOn..........cccccescecseseeeeeseeeceseeeeaeeeseeeeseeseeneueees 11 
IV. Essential Validity Elements for Summative and Interim ASSESSMENTS ..........::cccseseeeeeeeseeeeeeeees 16 
V. Validity Agenda for SUMMative ASSESSMEMNS ..........::ccccccssseeecceesseeeeeceasseeeeeeeaseeeeeeauseeesessaseeeeess 25 
SUMMative: ASSESSMENT. PULPOSE 1 visssiniccsinacescacteurawoanateconseacaawwceavdnceatewcavaccdveumtagencoaceseaeumetewanccs 25 
Summative ASSESSMEeENt PULPOSES 2 ANC 33 uo... eeecccseeeeeeeseeeeeseeeeeeeeeaeeeeeeeeeeeeeeaeeeeeaueeseaueeseeeenes 35 
Validating College and Career ReadineSS BENCNHMALKS...........::ccccesseceeesseeeeeesseeeeeaseeeteseesesaseees 39 
SuMmMative ASSESSMENt PULPOSE Ai... eeecseseessceeueecueueecceuenaueeecaeueenaeueeeueueuauauenauasanauusueueueneueuenes 46 
SuMMative ASSESSMENt PULPOSE Di... ecseseesessecseueecceueucueueceueeenacusecaeueeeueueuauauenaesanauesenueueneuarenes 49 
SuMMative ASSESSMENt PULPOSE Gi... eecescessesscssssecueueusueuecaueesnacueecaeueecueueuauavanauesenauusenueueneueuenes 50 
SUIMMative: ASSESSMENT. PUIDOSE 7 iva scssictsiaaceasaccevatucaceesedaatecaatoscadantoszauccvencddaeiwaaesncteceeeaasautevaanen 52 
VI. Validity Agenda for Interim ASSESSMENUS ...........:csssseceeceessseeeeceeasseeececeasseeeceeeaeeeeeeeaaseeeesesaseeeeess 56 
IMTEFIATASSESSMONE PUL DOS cmcetutisderraieut vant Maacdetiwbdiaadetedbatier bath tert eutieue toate tethers 56 
INteriM ASSESSMENT PULPOSE 2:1........cceccsccssessceeuesceeueeeueueneueuenauueanauusenueueneuaueneuauanauusanauueenaeueneuevenee 5/ 
INteriM ASSESSMENT PULPOSE 31.......ccccesesssesscssuescueueueueueneusuenauasanauueenueueneuaueneuauanauaeenauueanauueneueuenes 58 
InteriM ASSESSMENT PULPOSE 42... eceesssssssscesesscceueucueueneesenauaeenauueenueueeeuaueneuauanauauanauusenueueneuerenes 59 
Vil. Research Agenda for Formative ASSESSMENt RESOUICES...........:cccccsesseeeeeeesseeeecessasseeeseeaaseeeees 60 
Vill. Summary: The Smarter Balanced Assessment Consortium Validity ArgUMENT............220ee 63 
SUIMMALIZINS UNS VW aE EV 1G CC Coo cttowihicetcineitetedeettnctnctete cuidate lattes eetnanieia dentin teens 64 
IX. Ongoing Validation ACctivitieS ANd SUPPOrt SYSTEMS ..........ccccccssseeeeceeesseeeeeeeeaseeeeceeaseeeesesaseeeeees {2 
FS INC OS eee ri cae acta ioe cnet ce etait hc beta de ear Lecter etter eee ea ined uare Gutta ann eit Hira tata Bo eee abeatetet on cieait: 13 
Appendix A: Smarter Balanced Theory of Action and Derivation of Purpose Statements............ 83 
Appendix B: Description of AliSNMeENt Methods. ............:cssecccssseeeceesseeeeeaseeeeeasseeeeeseeeeeaseeseeasseseeeaees 86 
Appendix C: Description of Item Similarity Rating Approach to Evaluating Test Content............. 8/ 
Appendix D: Description of ResidPlots2: IRT Residual AnalySiS SOftWAre..........cccccceeseeeseeeeeeesenenes 89 


Smarter Balanced Assessment Consortium 
Comprehensive Research Agenda 


|. Introduction 


In September 2010, the U.S. Department of Education awarded $175 million to the Smarter 
Balanced Assessment Consortium (Smarter Balanced) to develop assessments in English 
language arts (ELA) and mathematics that would “provide ongoing feedback to teachers during 
the course of the school year, measure annual student growth, and move beyond narrowly- 
focused bubble tests” (U.S. Department of Education, 2010). This award was part of the federal 
government’s $4.35 billion Race to the Top competitive grant fund, which rewarded states for: 


e Adopting standards and assessments that prepare students to succeed in college and the 
workplace and to compete in the global economy; 


e Building data systems that measure student growth and success, and inform teachers 
and principals about how they can improve instruction; 


e Recruiting, developing, rewarding, and retaining effective teachers and principals, 
especially where they are needed most; and 


e Turning around our lowest-achieving schools. (U.S. Department of Education, 2009a, p. 2) 


The goals of Smarter Balanced are comprehensive and are consistent with those of the Race to 
the Top Initiative. At the time of this report, Smarter Balanced represents a consortium of 25 
states working together to develop cutting-edge ELA and mathematics assessments that 
feature computer-adaptive technology, technology-enhanced item formats, Summative and 
interim assessments, and formative assessment resources. The assessment system being 
developed by the Consortium is designed to provide comprehensive information about student 
achievement that can be used to improve instruction and provide extensive professional 
development for teachers. The Smarter Balanced assessment system focuses on the need to 
strongly align curriculum, instruction, and assessment, in a way that provides valuable 
information to support educational accountability initiatives. 


The specific goals of Smarter Balanced are described in its “Theory of Action,” which is 
presented in Appendix A. The purpose of this report is to outline the research that should be 
conducted to (a) provide information to Smarter Balanced to help the Consortium accomplish 
its goals as it implements the program, and (b) evaluate the degree to which the Consortium is 
meeting its goals. Given that a large part of Smarter Balanced involves developing, 
administering, and scoring the assessments, and reporting the assessment results, much of 
the recommended research is based on the guidance provided by the Standards for 
Educational and Psychological Testing (AERA, APA, & NCME, 1999), hereafter referred to as the 
Standards. 


Purposes of This Report 


The purposes of this report are to inform Smarter Balanced of research that should be done to 
evaluate the degree to which the Consortium is accomplishing its goals and to demonstrate 
that the assessment system adheres to professional and federal guidelines for fair and high- 
quality assessment. The intent is to provide a comprehensive and detailed research agenda for 
the Consortium that includes suggestions and guidance for both short- and long-term research 
activities that will Support Consortium goals. 


To best inform the Consortium, we provide a description of the Standards, which were used as 
a framework for developing much of the research agenda. Integral to this description is a 
discussion of validity and the test validation process. We also reference the U.S. Department of 
Education’s Standards and Assessments Peer Review Guidance (2009b), which stipulated the 
requirements for assessment programs to receive federal approval under the No Child Left 
Behind (NCLB) legislation. Although not described in this report, the research agenda also 
considered and is consistent with the Joint Committee on Standards for Educational Evaluation 
(JCSEE) Program Evaluation Standards (Yarbrough, Shulha, Hopson, & Caruthers, 2011) as well 
as the Guiding Principles for Evaluators (American Evaluation Association, 2004), which state 
that “evaluators aspire to construct and provide the best possible information that might bear 
on the value of whatever is being evaluated” (p. 1). The research agenda proposed here is 
designed to provide the best possible information to Smarter Balanced for understanding both 
the degree to which the Consortium is meeting its goals as well as what it can do to improve the 
system as it evolves. 


In the remainder of this report, we (a) discuss the development of a validation plan that is 
consistent with the Standards and with the U.S. Department of Education’s Standards and 
Assessments Peer Review Guidance; (b) list the primary purposes and goals of Smarter 
Balanced; (c) list the key validity issues associated with these purposes and goals; and (d) 
provide a description of studies that should be done to provide evidence regarding the degree 
to which Smarter Balanced assessments and activities are meeting the intended goals. 


ll. Standards and Guidelines for Test Validation 


The Standards for Educational and Psychological Testing: A Validation Framework 


There have been debates regarding what the term “validity” refers to, but for over 50 years 
three organizations—the American Educational Research Association (AERA), the American 
Psychological Association (APA), and the National Council on Measurement in Education 
(NCME)—have worked together to forge a consensus view of validity and provide guidance for 
developing and validating educational and psychological tests (Sireci, 2009). Currently, the 
Standards for Educational and Psychological Testing (AERA et al., 1999) define validity as 
“the degree to which evidence and theory support the interpretations of test scores entailed 
by proposed uses of tests” (p. 9). This definition emphasizes the importance of theory and 
empirical evidence to support the use of a test for a particular purpose. Thus, the research 
agenda for Smarter Balanced must be derived from the intended testing purposes and how 
assessment scores will be used. 


The Standards describe the process of validation as that of developing a convincing argument, 
based on empirical evidence, that the interpretations and actions based on test scores are 
sound. Kane (1992, 2006) characterized this process as a validity argument, which is 
consistent with the validation process described by the Standards. For example, 


A sound validity argument integrates various strands of evidence into a 
coherent account of the degree to which existing evidence and theory support 
the intended interpretation of test scores for specific uses . . . Ultimately, the 
validity of an intended interpretation ... relies on all the available evidence 
relevant to the technical quality of a testing system. This includes evidence of 
careful test construction; adequate score reliability; appropriate test 
administration and scoring; accurate score scaling, equating, and standard 
setting; and careful attention to fairness for all examinees... (AERA et al., 
1999, p. 17) 


This excerpt reinforces the Standards’ emphasis that validation should center on test-score 
interpretation for specific uses. The research agenda developed for Smarter Balanced will be 
designed to fulfill the requirements of a sound validity argument as described by the Standards. 


The Standards’ Five Sources of Validity Evidence. To develop a sound validity argument, the 
Standards provide a validation framework based on five sources of validity evidence. These 
sources are validity evidence based on (a) test content, (b) response processes, (c) internal 
structure, (d) relations to other variables, and (e) consequences of testing. 


Validity evidence based on test content refers to traditional forms of content validity evidence 
such as practice (job) analyses and subject-matter expert review and rating of test 
specifications and test items (Crocker, Miller, & Franks, 1989; Sireci, 1998), as well as newer 
“alignment” methods for educational tests that evaluate the links among curriculum 
frameworks, testing, and instruction (Bhola, Impara, & Buckendahl, 2003; Martone & Sireci, 
2009). Evidence in this category is used to confirm that the tests that students take adequately 
represent the intended knowledge and skill areas. Confirming the degree to which the Smarter 
Balanced test specifications capture the intended Common Core State Standard (CCSS) and 
confirming that the items that students take adequately represent the areas delineated in the 
test specifications are examples of validity evidence based on test content that will be needed 
to build a strong validity argument for the Smarter Balanced assessments. 


Validity evidence based on response processes refers to “evidence concerning the fit between 
the construct and the detailed nature of performance or response actually engaged in by 


examinees” (AERA et al., 1999, p. 12). Such evidence can include interviewing test takers 
about their responses to test questions, systematic observations of test response behavior, 
evaluation of the criteria used by judges when scoring performance tasks, analysis of item 
response time data, and evaluation of the reasoning processes that examinees use when 
solving test items (Embretson [Whitley], 1983; Messick, 1989; Mislevy, 2009). Such evidence 
will be needed to confirm that the Smarter Balanced assessments are measuring the cognitive 
skills that they intend to measure, and that students are using the targeted skills to respond to 
the test items. 


Validity evidence based on /nterna/ structure refers to statistical analysis of item and sub-score 
data to investigate the primary and secondary (if any) dimensions measured by an assessment. 
Procedures for gathering such evidence include factor analysis (both exploratory and 
confirmatory) and multidimensional scaling. Internal structure evidence also evaluates the 
“strength” or “salience” of the major dimensions underlying an assessment, and so would also 
include indices of measurement precision, such as reliability estimates, decision accuracy and 
consistency estimates, generalizability coefficients, conditional and unconditional standard 
errors of measurement, and test information functions. In addition, analysis of differential item 
functioning (DIF), which is a preliminary statistical analysis to assess item bias, also falls under 
the internal structure category. 


Evidence based on re/ations to other variables refers to traditional forms of criterion-related 
validity evidence, such as concurrent and predictive validity studies, as well as more 
comprehensive investigations of the relationships among test scores and other variables, such 
as multitrait-multimethod studies (Campbell & Fiske, 1959), and score differences across 
different groups of students, such as those who have taken different courses. These external 
variables can be used to evaluate hypothesized relationships between test scores and other 
measures of student achievement (e.g., test scores and teacher grades), to evaluate the degree 
to which different tests actually measure different skills, and the utility of test scores for 
predicting specific criteria (e.g., college grades). This type of evidence will be essential for 
Supporting the validity of certain inferences based on scores from Smarter Balanced 
assessments (e.g., certifying college and career readiness). 


Finally, evidence based on consequences of testing refers to evaluation of the intended and 
unintended consequences associated with a testing program. Examples of evidence based on 
testing consequences include investigations of adverse impact, evaluation of the effects of 
testing on instruction, and evaluation of the effects of testing on issues such as high school 
dropout and job applications. Other investigations of testing consequences relevant to the 
Smarter Balanced goals include analysis of students’ opportunity to learn the CCSS, and 
analysis of changes in textbooks and classroom artifacts. With respect to educational tests, the 
Standards stress studying testing consequences. For example, they state, 


When educational testing programs are mandated .. . the ways in which test 
results are intended to be used should be clearly described. It is the 
responsibility of those who mandate the use of tests to monitor their impact and 
to identify and minimize potential negative consequences. Consequences 
resulting from the use of the test, both intended and unintended, should also be 
examined by the test user. (AERA et al., 1999, p. 145). 


Thus, it is important that validity evidence based on testing consequences is prominent in the 
Smarter Balanced research agenda. 


Using the Standards as a Validation Framework. The Standards are considered to be “the most 
authoritative statement of professional consensus regarding the development and evaluation 
of educational and psychological tests” (Linn, 2006, p. 27). Therefore, they have great utility in 


guiding a validity agenda. The validation research component of this comprehensive research 
agenda is based on crossing the intended purposes and use of Smarter Balanced assessments 
with the Standards’ five sources of validity evidence. Therefore, the first step in determining the 
Smarter Balanced validity research agenda was to explicitly state its goals and purposes. These 
goals and purposes that are the focus of validation are described in Chapter Ill of this report. 


NCLB Peer Review Guidelines 


One of the seven principles underlying the Smarter Balanced Theory of Action is the adherence 
“to established professional standards” (Smarter Balanced, 2010, p. 33). In addition to 
adhering to the Standards, the Consortium will also meet the requirements of the U.S. 
Department of Education’s Peer Review process for NCLB assessments. Although these 
requirements are temporarily suspended as they undergo revision (Delisle, 2012), they remain 
important because they reflect the Department’s most recent standards for ensuring quality 
and equity in statewide assessment programs. Thus, the research agenda incorporates much of 
the guidance provided in the Standards and Assessments Peer Review Guidance (U.S. 
Department of Education, 2009b). There is a great deal of overlap between the Standards and 
the U.S. Department of Education’s Peer Review Guidance. However, the Guidance stipulates 
several important requirements that are highlighted in this research agenda. In particular, it 
requires: 


e Providing evidence of the purpose of an assessment system and studies that support the 
validity of using results from the assessment system for their stated purpose and use 
(p. 42) 


e Strong correlations of test and item scores with relevant measures of academic 
achievement, and weak correlations with irrelevant characteristics, such as demographics 
(p. 42) 


e Investigations regarding whether the assessments produce intended or unintended 
consequences (p. 42) 


e Documentation supporting evidence of the delineation of cut scores and the rationale and 
procedures for setting cut scores (pp. 21-22) 


e Evidence of the precision of the cut scores & consistency of student classification (p. 44) 
e Evidence of reliability for overall population and for each reported subpopulation (p. 44) 
e Evidence of alignment over time through quality control reviews (p. 52) 


e Evidence of comprehensive alignment and measurement of the full range of content 
standards and depth of knowledge and cognitive complexity (p. 54) 


e Evidence that the assessment plan and test specifications describe how all content 
standards are assessed and how the domain is sampled to lead to valid inferences about 
student performance on the standards, individually and in the aggregate (using impartial 
experts in the process) (p. 54) 


e Scores that reflect the full range of achievement standards (p. 57) 


e Documentation to describe that the assessments are a “coherent” system across grades 
and subjects including studies establishing vertical scales (p. 34) 


e ldentification of how each assessment will provide information on the progress of students 
(p. 34) 


The overlap of these requirements with the Standards is clear, and the anticipated revisions to 
this guidance will likely retain these key features. For example, in the recent letter informing 
states of the temporary suspension of peer review, the Department reiterated the following 
desired characteristics: 


A high-quality assessment system [is] one that is “valid, reliable, and fair for its intended 
purposes; and measures student knowledge and skills against college- and career- 
ready standards in a way that 


e Covers the full range of those standards, including standards against which student 
achievement has traditionally been difficult to measure; 


e As appropriate, elicits complex student demonstrations or applications of 
knowledge and skills; 


e Provides an accurate measure of student achievement across the full performance 
continuum, including for high- and low-achieving students; 


e Provides an accurate measure of student growth over a full academic year or 
course; produces student achievement data and student growth data that can be 
used to determine whether individual students are college- and career-ready or on 
track to being college- and career-ready; 


e Assesses all students, including English language learners and students with 
disabilities; 

e Provides for alternate assessments based on grade-level academic achievement 
standards or alternate assessments based on alternate academic achievement 


standards for students with the most significant cognitive disabilities, consistent 
with 34 C.F.R. § 200.6(a)(2); and 


e Produces data, including student achievement data and student growth data, that 
can be used to inform: determinations of school effectiveness for purposes of 
accountability under Title |; determinations of individual principal and teacher 
effectiveness for purposes of evaluation; determinations of principal and teacher 
professional development and support needs; and teaching, learning, and program 
improvement.” 


These characteristics of high-quality assessment systems were also considered in development 
of the comprehensive research agenda to ensure that evidence will be provided to demonstrate 
that the Smarter Balanced system meets these high standards. 


Other Validation Guidelines 


In addition to the AERA et al. (1999) Standards and the U.S. Department of Education’s (2009) 
Peer Review Guidance, there have been other seminal works that have influenced test 
validation practices. Messick’s (1989) landmark chapter influenced the Standards and 
encouraged validators to focus on test use and the evaluation of testing consequences. Kane 
(1992, 2006), mentioned earlier, advanced Cronbach's (1988) notion of validation as an 
evaluation argument, and this notion is also embodied in the Standards. A recent addition to 
the validity literature is Bennett (2010), who expanded discussion of validation to include 
validation of a theory of action. This perspective is relevant to Smarter Balanced and is 
addressed in Chapter VIII. In short, this comprehensive research agenda incorporates many of 
the current theories and practices in test validation. 


In addition to general guidelines on validation, there are also guidelines for specific testing 
applications. For example, the International Test Commission (ITC) produced Guidelines for 


Translating and Adapting Tests (Hambleton, 2005; ITC, 2010), which are relevant to the 
evaluation of the Spanish-language versions of the Smarter Balanced mathematics 
assessments. There are also guidelines for universal test design (e.g., Johnstone, Altman, & 
Thurlow, 2006), and sensitivity review (e.g., Ramsey, 1993), which are relevant to the 
evaluation of the development of the Smarter Balanced assessments. Other documents 
consulted to guide this research agenda include Kane’s (1994, 2001) criteria for evaluating 
standard setting studies (described further in Chapter IV) and the recent guidelines published 
by NCME (2012) on maintaining test integrity . 


10 


lll. Smarter Balanced Purpose Statements for Validation 


As mentioned earlier, va/idation refers to gathering and evaluating evidence with respect to 
Specific testing purposes. Thus, a first step in developing the comprehensive research agenda 
was identifying and articulating the intended purposes of Smarter Balanced. As the AERA et al. 
(1999) Standards state, “When educational testing programs are mandated by school, district, 
state, or other authorities, the ways in which test results are intended to be used should be 
clearly described...” (p. 168). 


Although the Smarter Balanced Theory of Action described the overall goals of the Consortium, 
it was too general for evaluation or validation purposes. Thus, several steps were conducted to 
articulate the primary purposes and goals of Smarter Balanced that would be the focus of 
validation. These steps involved: 


Extensive review of Smarter Balanced documentation; 

Compiling a list of explicit claims, goals, and purposes; 

Presenting this list to the Smarter Balanced Technical Advisory Committee (TAC); 
Refining the list based on feedback; 


Presenting the revised list to Smarter Balanced work groups; 


OF Ok ge Es I ee 


Observing the Smarter Balanced Collaboration Conference and discussing goals, purposes, 
and validation plans with work groups, staff, and contractors; 


¢. Developing a draft list of Smarter Balanced goals and purposes to be the focus of 
validation; 


8. Discussing this list with Smarter Balanced work groups via WebEx teleconferences; and 
9. Revising the list based on work group input. 


The identification of Smarter Balanced-specific goals began with the Theory of Action (Appendix 
A), but also involved a review of numerous Smarter Balanced documents, including the original 
Race to the Top application (Smarter Balanced, 2010), test specification documents (e.g., ETS, 
2012a, 2012b), press releases, and requests for proposals (RFPs). More than 50 documents 
were reviewed in order to detect any stated claims, purposes, or goals. These reviews led to a 
preliminary list of goals and purposes that were presented to the Smarter Balanced TAC in July 
2012. Feedback was received from the TAC and then from selected members of the Smarter 
Balanced Validation and Psychometrics/Test Design Work Group. Based on this feedback, 
refinements were made to the list of goals and purposes and were shared with Smarter 
Balanced leadership at the Collaboration Conference in September 2012. Further feedback 
was received, which included receipt of other documents that should be factored into the final 
articulation of goals and purposes. 


Based on the observations and interaction with Consortium members, and the feedback 
provided by the TAC and the work group, a focus-group protocol was developed to involve 
Smarter Balanced leadership in the final articulation of testing purposes via WebEx 
teleconferences. Focus groups were held via WebEx in October 2012 with both the Validation 
and Psychometrics/Test Design Work Group and the Test Administration/Student Access Work 
Group. Excluding the facilitator, ten people participated in the first focus group (October 24, 
2012) and sixteen people participated in the second (October 31, 2012). Each focus group was 
90 minutes in duration. Following each focus group, draft purpose statements were sent to the 
participants via SurveyMonkey, and participants rated and commented on the appropriateness 
of the draft purpose statements. Based on these ratings and comments, the draft statements 


bal 


were revised. These statements were presented to the TAC on December 12, 2012, and 
additional feedback was received and incorporated. 


The final list of Smarter Balanced purpose statements that are the focus of validation follow. A 
description of the Smarter Balanced Theory of Action is presented in Appendix A to illustrate the 
degree to which the final list of purpose statements covers the major intentions stated in the 
Theory of Action. 


The Smarter Balanced purpose statements for validation are separated into three categories 
that refer to (a) the summative assessments, (b) the interim assessments, and (c) formative 
assessment resources. 


The purposes of the Smarter Balanced summative assessments are to provide valid, reliable, 
and fair information about: 


1. Students’ ELA and mathematics achievement with respect to those CCSS measured by the 
ELA and mathematics summative assessments. 


2. Whether students prior to grade 11 have demonstrated sufficient academic proficiency in 
ELA and mathematics to be on track for achieving college readiness. 


3. Whether grade 11 students have sufficient academic proficiency in ELA and mathematics to 
be ready to take credit-bearing college courses. 


Students’ annual progress toward college and career readiness in ELA and mathematics. 
How instruction can be improved at the classroom, school, district, and state levels. 


Students’ ELA and mathematics proficiencies for federal accountability purposes and 
potentially for state and local accountability systems. 


¢. Students’ achievement in ELA and mathematics that is equitable for a// students and 
subgroups of students. 

The purposes of the Smarter Balanced /nterim assessments are to provide valid, reliable, and 

fair information about: 


1. Student progress toward mastery of the skills measured in ELA and mathematics by the 
Summative assessments. 


2. Students’ performance at the content cluster level, so that teachers and administrators can 
track student progress throughout the year and adjust instruction accordingly. 


3. Individual and group (e.g., school, district) performance at the claim level in ELA and 
mathematics, to determine whether teaching and learning are on target. 


4. Student progress toward the mastery of skills measured in ELA and mathematics across a// 
students and subgroups of students. 

The purposes of the Smarter Balanced formative assessment resources are to provide 

measurement tools and resources to: 


Improve teaching and learning. 


2. Monitor student progress throughout the school year. 
3. Help teachers and other educators align instruction, curricula, and assessment. 
4. Help teachers and other educators use the summative and interim assessments to improve 


instruction at the individual student and classroom levels. 


12 


5. Illustrate how teachers and other educators can use assessment data to engage students 
in monitoring their own learning. 


The remainder of this report centers on these purpose statements and their validation. The 
validation framework for the summative and interim assessments is based on the 
aforementioned five sources of validity evidence described in the Standards and involves 
crossing the purpose statements with each of the five sources. The formative assessment 
resources are not assessments per se, and so the research in support of their intended 
purposes extends beyond the five sources of validity evidence and follows a more traditional 
program evaluation approach. 


As a prelude to Chapters V and VI, Tables 1 and 2 illustrate the validation framework for the 
Summative and Interim Assessments by crossing the purpose statements for each component 
with the five sources of validity evidence. The check marks in the cells indicate the type of 
evidence that is most important for validating each specific purpose. This presentation is 
extremely general, but indicates the comprehensiveness of the research agenda. It is also 
useful for understanding which sources of validity evidence are most important to specific 
purposes. For example, for purposes related to providing information about students’ 
knowledge and skills, validity evidence based on test content will always be critical. For 
purposes related to classifying students into achievement categories such as “on track” or 
“college ready,” validity evidence based on internal structure is needed, because that evidence 
includes information regarding decision consistency and accuracy. 


13 


Table 1. Validity Framework for Smarter Balanced Summative Assessments 


The purposes of the Smarter Balanced summative assessments are to 


provide valid, reliable, and fair information about: 


1. Students’ ELA and mathematics achievement with respect to those 
CCSS measured by the ELA and mathematics summative assessments. 


2. Whether students prior to grade 11 have demonstrated sufficient 
academic proficiency in ELA and mathematics to be on track for 
achieving college readiness. 


3. Whether grade 11 students have sufficient academic proficiency in 
ELA and mathematics to be ready to take credit-bearing college courses. 


4. Students’ annual progress toward college and career readiness in ELA 
and mathematics. 


5. How instruction can be improved at the classroom, school, district, 
and state levels. 


6. Students’ ELA and mathematics proficiencies for federal 
accountability purposes and potentially for state and local accountability 
systems. 


7. Students’ achievement in ELA and mathematics that is equitable for 
all students and subgroups of students. 


Content 


14 


Taiksy eats) 
Structure 


Source of Validity Evidence 


Relations w/ Ext. 
Variables 


Response 
Processes 


Testing 
Oo) atsy1o [UL =) arexers) 


Table 2. Validity Framework for Smarter Balanced /nterim Assessments 


Source of Validity Evidence 


The purposes of the Smarter Balanced /nterim assessments are to 
provide valid, reliable, and fair information about: Tait=)aats) 
Content 
Structure 


1. Student progress toward mastery of the skills measured in ELA and 
mathematics by the summative assessments. 


2. Students’ performance at the content cluster level, so that teachers 


3. Individual and group (e.g., school, district) performance at the claim 
level in ELA and mathematics, to determine whether teaching and 
learning are on target. 


4. Student progress toward the mastery of skills measured in ELA and 
mathematics across al// students and subgroups of students. 


and administrators can track student progress throughout the year and 
adjust instruction accordingly. 
pf 


15 


Relations w/ Ext. 
Variables 


Response 
Processes 


| 


Testing 
Oo) atsy1o [UL =) arexers) 


IV. Essential Validity Elements for Summative and Interim Assessments 


Before describing specific studies associated with each of the testing purposes listed in the previous 
chapter, it is important to first consider the fundamental validity information that is needed for any 
educational assessment program. These “essential elements” cut across the five sources of validity 
evidence and so deserve particular attention. The Standards describe such fundamental information 
as “evidence of careful test construction; adequate score reliability; appropriate test administration 
and scoring; accurate score scaling, equating, and standard setting; and careful attention to fairness 
for all examinees” (AERA et al., 1999, p. 17). Most of these essential elements fall under the 
categories of validity evidence based on test content (e.g., careful test construction) and internal 
structure (adequate score reliability, scaling, equating), but others, such as test administration and 
scoring, and careful attention to fairness, fall outside these two categories and do not neatly fit into 
the others. In addition to these fundamental elements, two other elements are essential: (a) 
equitable participation and access, and (b) test security. 


In this chapter, we describe the types of information needed to confirm that these essential 
elements are adequately addressed in the research agenda. Because these elements refer to 
assessments, they are described in relation to the summative and interim assessments. However, 
“equal participation and access” is also important with respect to the formative assessment 
resources, which are discussed in Chapter VII. 


In Table 3, we present a brief description of the validity evidence for the essential elements 
associated with the summative and interim assessments. Although the preceding quote from the 
Standards mentions adequate “reliability,” we refer more generally to adequate “measurement 
precision” to underscore the need for measurement error to also be conceptualized in other 
frameworks such as item response theory (IRT) and generalizability theory. 


The types of evidence listed in Table 3 will resurface when considering validity evidence for the 
specific purposes described earlier. This reoccurrence underscores the fundamental nature of these 
elements for Supporting the use of Smarter Balanced assessments for their intended purposes. Most 
of these essential elements are typically addressed in technical manuals that support an 
assessment program. Descriptions of the types of studies to be conducted for each essential 
element follow. 


Careful Test Construction 


As indicated in Table 3, validity evidence of careful test construction can come from a 
comprehensive audit of the test development process. This audit should be a comprehensive review 
of all test development activities, starting with the descriptions of testing purposes, operational 
definitions of the constructs measured, item development, content reviews, alignment studies, 
sensitivity reviews, pilot testing, item analyses, DIF analyses, item selection, item calibration, scoring 
rubrics for constructed-response items, and creation of test booklets (and clarity of test instruction). 
For adaptive assessments, the adequacy of the item selection algorithm, and the stopping rule, 
should also be reviewed. 


16 


Table 3. Validity Evidence Associated with Essential Elements for Summative and Interim Assessments 


Essential Element Weel iCorsid(e) alm ahgce(s) alors) 


Audit of test development steps, including construct definition 
(test specifications and blueprints), item writing, content 

Careful Test Construction review, item analysis, alignment studies, and other content 
validity studies; review of technical documentation such as IRT 
calibration 


Analysis of test information, conditional standard errors of 
Adequate Measurement Precision measurement, decision accuracy, decision consistency, and 
reliability estimates for all reported scores 


Audit of test administration procedures, analysis of test 
Appropriate Test Administration irregularities, analysis of use and appropriate assignment of 
test accommodations 


Audit of scoring procedures (hand, automated), inter-rater 
reliability analyses, rater drift (Scale stability) analyses, 
computer/human comparisons (if relevant), generalizability 
studies, fairness for minorities 


Appropriate Scoring 


Third-party verification of horizontal and vertical equating, IRT 
residual analysis, analysis of equating error, documentation of 
scaling and equating procedures, population invariance of 
equating 


Accurate Scaling and Equating 


Comprehensive standard setting documentation, including 
procedural, internal, and external validity evidence for all 
achievement level standards set on assessments; includes 
criterion-related studies 


Appropriate Standard Setting 


Sensitivity review, DIF analyses, differential predictive validity 
analyses, comparability analyses (for language and disability 

Careful Attention to Fairness accommodations), review of accommodation policies, 
implementation of accommodations, qualitative and statistical 
analyses of accommodated tests 


Equitable Participation and Access Analysis of participation rates, test accommodations, 
translations, and other policies 

Analysis of data integrity policies, test security procedures, 

monitoring of test administrations, analysis of cheating 

behavior, analysis of item exposure, review of chat rooms and 

websites for exposed items, review of anomalous results 


Adequate Test Security 


Examples of types of evidence that would be reviewed are presented in Table 4. Although a checklist 
format is used in Table 4, an audit would not simply check whether the activity was in place; rather, it 
would evaluate the quality of the activity. 


17 


Table 4. Sample Checklist for Audit of Test Construction Procedures 


Not 
Completed 


Activity Ore) anye)(=1K-10 


Comments 


Adequate Measurement Precision 


Measurement precision extends the notion of reliability beyond a descriptive statistic for a test. It 
refers to the amount of expected variation in a test score, or classification based on a test score. 
Examples of this information include estimates of score reliability, standard errors of measurement, 
conditional standard errors of measurement, item and test information functions, conditional 
standard error functions, and estimates of decision accuracy and consistency. Estimates of score 
reliability include internal consistency estimates based on a single test administration (coefficient 
alpha, stratified aloha, marginal reliability), and those based on testing individuals more than once 
(test retest, parallel forms). The essential information needed for the Smarter Balanced assessments 
includes reliability estimates for all scores reported for students, estimates of decision consistency 
and accuracy for any reported achievement level results, and the traditional test information and 
standard error functions associated with IRT analyses. Generalizability studies that focus on specific 
sources of error will be important for identifying the sources of measurement error. 


Appropriate Test Administration 


Evidence in this category involves review of test administration manuals and other aspects of the 
test administration processes. This review should include a review of the materials and processes 
associated with both standard and accommodated test administrations. Observations of test 
administrations, and a review of proctor and test irregularity reports, should also be included. The 
policies and procedures for granting and providing accommodations to students with disabilities and 
English language learners should also be reviewed, and case studies of accommodated test 


18 


administrations should be selected and reviewed to evaluate the degree to which the policies and 
procedures were followed. 


Appropriate Scoring 


Validity evidence to confirm that the scoring of Smarter Balanced assessments is appropriate should 
include a review of scoring documentation. The Standards state that such documentation “should be 
presented... in sufficient detail and clarity to maximize the accuracy of scoring” (AERA et al., 1999, 
op. 47), as should the processes for selecting, training, and qualifying scorers. The scoring processes 
should also include monitoring of the frequency of scoring errors and how they are corrected. In 
terms of specific studies, evaluation of scorer reliability and score scale drift should be conducted. If 
any assessments are scored locally, the degree to which the scorers are trained, and the accuracy of 
their scores, should also be studied. Evidence in this category should also confirm that the routing of 
students during the adaptive exams is correct, and that all computerized scoring programs are 
accurate. The Standards a/so point out that one way to evaluate computerized scoring algorithms is 
to commission “an independent review of the algorithms by qualified professionals” (p. 70). 
Generalizability studies to locate sources of measurement error due to scoring will also provide 
important evidence. 


Accurate Scaling and Equating 


Scaling and equating are essential activities for providing valid scores and score interpretations for 
Smarter Balanced assessments. Scaling activities include item calibration and creation of the 
standardized scale on which scores are reported. Equating activities will ensure that different forms 
of the assessments are on a common scale, as are scores reported over time. At the time of this 
writing, the summative assessments are intended to be vertically equated across grades. For the 
adaptive tests, the notion of a test “form” does not apply because the items are calibrated onto a 
common scale and can be assembled together uniquely for each examinee. This process requires 
that the items are correctly calibrated and that the IRT model sufficiently fits the data. Validity 
evidence for scaling and equating will include evaluation of the IRT model, confirming the 
hypothesized dimensionality of the assessments, evaluating equating documentation and estimates 
of equating error, evaluating the viability of a single construct (dimension) across grades, and, 
potentially, evaluating the invariance of the equating functions across important subgroups of 
students, such as students in different states. If funds are available, a “redundancy analysis,” where 
an independent third party replicates the equating done by the contractor, would provide an 
important validity check on the accuracy of the equating. 


Appropriate Standard Setting 


When achievement level standards are set on tests, test scores often become less important than 
the classifications that students receive. The standard setting literature is full of different methods 
for setting standards, but regardless of the method used, there must be sufficient validity evidence 
to support the classification of students into achievement levels. The Smarter Balanced summative 
assessments will use achievement levels, some of which will signify that students are “on track” to 
college readiness (grades 3-8) or “college ready” (grade 11). Kane (1994, 2001) wrote about 
gathering and documenting validity evidence for standards set on educational tests and categorized 
the evidence into three categories—procedural, internal, and external. 


Procedural evidence for standard setting “focuses on the appropriateness of the procedures used 
and the quality of the implementation of these procedures” (Kane, 1994, p. 437). The selection of 
qualified standard setting panelists, appropriate training of panelists, clarity in defining the tasks and 
goals of the study, appropriate data collection procedures, and proper implementation of the method 
are all examples of procedural evidence. 


19 


Internal evidence for evaluating standard setting studies focuses on the expected consistency of 
results if the study were replicated. A primary criterion is the standard error of the cut score. 
However, calculation of this standard error is difficult due to dependence among panelists’ ratings 
and practical factors (e.g., time and expense in conducting independent replications). Oftentimes 
evaluations of the variability across panelists within a single study, and the degree to which this 
variability decreases across subsequent rounds of the study, are presented as internal validity 
evidence. However, as Kane (2001) pointed out, 


A high level of consistency across participants is not to be expected and is not 
necessarily desirable; participants may have different opinions about performance 
standards. However, large discrepancies can undermine the process by generating 
unacceptably large standard errors in the cutscores and may indicate problems in 
the training of participants. (p. 73) 


In addition to simply reporting the standard error of the cut score, Kane (2001) suggested that 
consistency can be evaluated across independent panels, subgroups of panelists, or assessment 
tasks (e.g., item formats), or by using generalizability theory to gauge the amount of variability in 
panelists’ ratings attributed to these different factors. Another source of internal validity evidence 
proposed by Kane was to evaluate the performance of students near the cut score on specific items, 
to see if their performance was consistent with the panelists’ predictions. 


External validity evidence for standard setting involves studying the degree to which the 
classifications of students based on test scores are consistent with other measures of their 
achievement in the same subject area. External validity evidence includes classification consistency 
across different standard setting methods applied to the same test, tests of mean differences across 
examinees classified in different achievement levels on other measures of achievement, and the 
degree to which external ratings of student performance are congruent with the students’ test-based 
achievement level classifications. It is likely that external validity evidence will be particularly 
important for validating the “college and career readiness” standards set on the summative 
assessments because several measures of college readiness already exist. In addition to 
classification consistency, the degree to which the constructs measured by these assessments 
overlap with the Smarter Balanced summative assessments, and the degree to which their 
definitions of readiness are similar, should be studied. 


Some specific criteria that can be used to provide validity evidence for standard setting are 
Summarized in Table 5. This table, adapted from Sireci, Hauger, Wells, Shea, & Zenisky (2009), 
illustrates the activities that should be conducted to (a) facilitate validity within the standard setting 
study, (b) evaluate the validity of the standard setting after it has been completed, or (c) do both. 


Table 5. Summary of Criteria for Evaluating Standard Setting Studies 


AY (o[=valex= OFait= yale) a Brief Explanation 


, . Qualifications, competence, and representativeness 
Care in selecting participants ae i . 
of panelists; sufficient number of panelists 
Procedural 


Justification of standard setting Degree to which methods used are logical, 
method(s) defensible, and congruent with testing purpose 
Panelist training Degree to which panelists were properly oriented, 
prepared, and trained 
Degree to which standard setting purposes, goals, 
SrahiiereReleiaake and tasks were clearly articulated 


20 


_ Evidence Criterion Brief Explanation 


Appropriate data collection Data were gathered as intended 
Proper implementation Method was implemented as intended 


Panelist confidence 


Sufficient documentation 


Sufficient inter-panelist consistency 


Decreasing variability across 
rounds 


Small standard error of cut score 
(consistency within method) 


Consistency across independent 


Panelists understood tasks and had confidence in 
their ratings 


Documentation of the entire process so that (a) it is 
understood and (b) it can be replicated 


Reasonable standard deviations and ranges of cut 
scores across panelists 


The variability across panelists’ cut scores 
decreases across rounds—evidence of emerging 
consensus 


Estimate of degree to which cut scores would 
change if study were replicated 


Estimate of degree to which cut scores would 


Internal panels change if different panelists were used 
Consistency across panelist Estimate of degree to which cut scores would 
Subgroups change if specific types of panelists were used 
PancirenewaureceticmTomiaic Estimate of the consistency of cut scores across 
y item formats (e.g., SR, CR items) 
Degree to which expectations of hypothetical 
Analysis of borderline students’ borderline students’ performance are consistent 
performance on specific items with the performance of students near the cut 
scores 
Consistency across standard Degree to which results from different standard 
setting methods setting methods yield similar results 
, Degree to which classifications of students based on 
Consistency across other student i? 
ie external data are congruent with classifications 
classification data 
based on the cut scores 
External 


Mean differences across 
proficiency groups on external 
criteria 


Reasonableness 


Note: Adapted from Sireci et al. (2009). 


Careful Attention to Fairness 


Degree to which students classified into different 
achievement levels differ on other relevant variables 


Degree to which cut scores produce results that are 
within a sensible range of expectations 


Careful attention to fairness begins at the earliest stages of test development and includes many of 
the activities described in the previous section on careful test construction. One important aspect of 


21 


fairness is acknowledging the diversity within the student population when defining the constructs 
measured. Considerations of this diversity will reduce ethnocentricity in the construct definition and 
allow the development of accommodations policies that stay faithful to the construct measured. 
Sensitivity reviews and analysis of DIF and differential predictive validity are other important aspects 
of test fairness. Ensuring that students have the opportunity to learn material before it is tested and 
ensuring that a fair appeal process is in place are other important aspects of fairness. The presence 
of these practices and policies will be checked as part of the research agenda. The recent NCME 
document on data integrity underscores the need for testing programs to have policies and 
procedures to “ensure that all students have appropriate, fair, and equal opportunities to show their 
knowledge, skills, and abilities” (NCME, 2012, p. 3). 


Equitable Participation and Access 


The Smarter Balanced system is designed for @a//students, and the intent is to provide flexibility and 
remove barriers that may inhibit students from taking the test and performing their best. The system 
is also designed to provide information widely, in transparent fashion, to all stakeholders. Equitable 
participation and access ensures that all students can take the test in a way that allows them to 
comprehend and respond appropriately. The research agenda should include an analysis of 
participation rates across subgroups of students as well as a review of the procedures in place to 
ensure full participation. In particular, the degree to which Smarter Balanced offers sensible 
accommodations for students with disabilities and English language learners should be studied, as 
well as the availability and successful implementation of those accommodations. As stated in the 
recent NCME (2012) guidelines on test integrity, “Students who need accommodations due to 
language differences or students with disabilities may require appropriate modifications to materials 
and administrative procedures to ensure fair access to the assessment of their skills” (p. 3). 


The U.S. Department of Education’s Peer Review Guidance (2009b) provides additional guidance for 
confirming equitable participation and access. For example, it requires: 


e Evidence of judgmental and data-based steps to ensure that assessments are fair and 
accessible to all students (p. 45) 


e Evidence of how universal design or linguistic accommodations are incorporated (p. 45) 
e Evidence that students with disabilities were included in the development process (p. 45) 
e Apolicy on appropriate selection and use of accommodations (p. 47) 


e Routine monitoring of accommodations used and ensuring that those used are used during 
instruction (p. 49) 


e Checks of quality and consistency for accommodations given to English language learners (p. 49) 


e Analysis of effect of usage of accommodations for English language learner students and 
students with 504s and IEPs (p. 49) 


Another aspect of equitable participation and access is the provision of opportunities to retake an 
assessment. According to current policy, Smarter Balanced “will offer a retake opportunity on the 
CAT portion of the summative assessment for students who feel their scores are inaccurate or that 
believe the test was administered under non-standard circumstances” (Smarter Balanced, n.d.). 


1 Marty McCall, personal communication, December 22, 2012. 


22 


Adequate Test Security 


Test security is a prerequisite to validity. Threats to test security include cheating behaviors by 
students, teachers, or others who have access to testing materials. A lack of test security may result 
in the exposure of items before tests are administered, students copying or sharing their answers, or 
changing of students’ answers to test questions. All of these behaviors have been observed in the 
past, and so those who value the validity of test scores worry about the prevalence of cheating 
behaviors. As described by NCME (2012), “When cheating occurs, the public loses confidence in the 
testing program and in the educational system, which may have serious educational, fiscal, and 
political consequences.” 


Thankfully, there are many proactive steps that testing agencies can take to reduce, eliminate, and 
evaluate cheating. The first step is to keep confidential test material secure and have solid 
procedures in place for maintaining the security of paper and electronic materials. The recent NCME 
(2012) document on data integrity outlined several important areas of test security. These areas 
include procedures that should be in place before, during, and after testing. The activities prior to 
testing include securing the development and delivery of test materials. Activities during testing 
include adequate proctoring to prevent cheating, imposters, and other threats. After testing, forensic 
analysis of students’ responses and answer changes, and of aberrant score changes over time, are 
also beneficial. The goal of these security activities is to ensure that test data are “free from the 
effects of cheating and security breaches and represent the true achievement measures of students 
who are sufficiently and appropriately engaged in the test administration” (NCME, 2012, p. 3). 


The evaluation of the test security procedures for the secure Smarter Balanced assessments will 
involve a review of the test security procedures and data forensics. The NCME (2012) document on 
test data integrity should be used to guide this evaluation. This document suggests that security 
policies should address: 


Staff training and professional development, maintaining security of materials and 
other prevention activities, appropriate and inappropriate test preparation and test 
administration activities, data collection and forensic analyses, incident reporting, 
investigation, enforcement, and consequences. Further, the policy should document 
the staff authorized to respond to questions about the policy and outline the roles 
and responsibilities of individuals if a test security breach arises. The policy should 
also have a communication and remediation response plan in place (if, when, how, 
who) for contacting impacted parties, correcting the problem and communicating 
with media in a transparent manner. (p. 4) 


With respect to specific studies that could evaluate security, in addition to an audit of test security 
policies, regular and systematic study of incorrect answer patterns for students who took the test in 
the same setting may be useful. However, with adaptive assessments, the probability of students 
receiving the same items at similar times is very low. Analyses of large score changes over time may 
be more useful, but it is important that any students, classes, or schools flagged for large score gains 
be considered innocent until proven guilty using external data (Wainer, 2011, chapter 8). Finally, 
given that most Smarter Balanced assessments will be delivered via computer, analysis of the time 
that students take to respond to items (e.g., are they correctly answering items in less time than it 
takes to read the item), and when tests are being accessed (are some tests accessed after hours?) 
will also provide important information regarding test security. Appendix C of the NCME (2012) 
document lists other examples of forensic analyses that could be conducted to evaluate test 
security. 


Summary of Essential Validity Elements 


In considering the essential validity elements that are “relevant to the technical quality of a testing 
system” (AERA et al., 1999, p. 17), we arrive at many of the studies that should be contained within 


23 


the comprehensive research agenda. These studies will be highlighted again in the remaining 
chapters to underscore how they provide important information relevant to specific purposes of the 
Smarter Balanced Assessment Consortium, and are coordinated with the other studies described in 
the Introduction to this report. 


24 


V. Validity Agenda for Summative Assessments 


As described in Chapter Ill, there are seven purposes associated with the Smarter Balanced 
Summative Assessments that we recommend be the focus of validation. All of the studies discussed 
in Chapter IV that pertain to essential validity elements apply to these purposes. In this chapter, we 
relate these studies to each purpose statement and provide further descriptions where necessary. 


It is important to note that each of the summative assessment purpose statements in Chapter III has 
the common preface “The purposes of the Smarter Balanced summative assessments are to provide 
valid, reliable, and fair information about...” In the sections that follow, we specify each purpose 
statement and then discuss the studies that should be done to provide the evidence to support the 
validity of the purpose. Within each purpose, the studies are organized by the Standards’ five 
sources of validity evidence. 


Summative Assessment Purpose 1: 


Provide valid, reliable, and fair information about students’ ELA and mathematics 
achievement with respect to those CCSS measured by the ELA and mathematics summative 
assessments. 


As indicated in Table 1 (p. 14), validity evidence to support this purpose should come from at least 
three sources—test content, internal structure, and response processes. With respect to validity 
evidence based on test content, studies should be conducted to confirm that the content of the 
Summative assessments adequately represents the CCSS intended to be measured in each grade 
and subject area. Appraisals of content domain representation and congruence to the CCSS must be 
made by carefully trained and /ndepvendent subject-matter experts, not by employees of or 
consultants for the testing contractors. Validity evidence based on internal structure should involve 
analysis of item response data to confirm that the dimensionality of those data match the intended 
structure and support the scores that are reported. All measures of reliability, test information, and 
other aspects of measurement precision are also relevant. Validity evidence based on response 
processes should confirm that the items designed to measure higher-order cognitive skills are 
tapping into those targeted skills. The types of studies that are recommended for each of these three 
sources of validity evidence are described next. 


Validity Studies Based on Test Content. Validity studies based on test content for the Smarter 
Balanced summative assessments need to evaluate the degree to which the assessments 
adequately measure the CCSS that they are designed to measure and in a way that conforms to the 
intended evidence-centered design (ECD; Mislevy & Riconscente, 2006). There should be at least 
two levels to the analysis. The first level would evaluate the degree to which the test specifications 
for the assessment sufficiently represent the intended CCSS. The second level of analysis should 
evaluate the degree to which the items administered to students adequately represent the test 
specifications. Studies relevant to these levels include traditional content validity studies (e.g., 
Crocker et al., 1989) and alignment studies (Bhola et al., 2003; Martone & Sireci, 2009; Porter & 
Smithson, 2002; Rothman, 2003; Webb, 2007). In Appendix B, we present brief descriptions of 
traditional content validity and alignment approaches and how they relate to one another. 


Evaluating test specifications. To evaluate the appropriateness of the test specifications, the process 
by which the specifications were developed should be reviewed to ensure that all member states had 
input and that there was consensus regarding the degree to which the test specifications represent 
the CCSS targeted for the assessment. The degree to which states agree that the test specifications 
appropriately represent the CCSS, given the constraints of the assessment, could be ascertained by 
Surveying curriculum specialists in the departments of education in the member states. Surveys 
could be constructed where these specialists would respond to selected- and open-response 
questions that would require them to comment on the degree to which the test specifications 


25 


adequately define the CCSS intended to be measured on the summative assessments, and the 
degree to which the relative weights of the cells in the test specifications reflect the corresponding 
emphases in the CCSS. 


Evaluating content and cognitive representation. To evaluate the degree to which the summative 
assessments adequately represent the test specifications requires recruiting and training qualified 
and independent subject-matter experts (SMEs) in ELA, writing, and mathematics to review the CCSS 
within the test specifications and Smarter Balanced test items. At least two hypothesized aspects of 
the assessments need to be validated using SMEs. First is that the items are appropriately 
measuring the CCSS that they are designed to measure. Second is that the items are measuring the 
breadth of higher- and lower-order cognitive skills that they are designed to measure. There are a 
variety of methods that could be used to evaluate these aspects of content validity—some based on 
traditional notions of content validity, and others based on alignment methodology (Martone & Sireci, 
2009). What the specific method is called is not important. What is important is that the tasks 
presented to the SMEs allow them to provide the data needed to evaluate the degree to which the 
assessments sufficiently represent the intended CCSS and the cognitive skills targeted by these 
standards. 


To evaluate the degree to which each test item adequately represents (i.e., is aligned with) its 
corresponding CCSS, there are several studies that could be conducted, ranging from simply having 
SMEs match test items to claim areas (similar to Webb’s categorical concurrence or Achieve’s 
[2006] blueprint confirmation) to having the SMEs use a Likert-type rating scale to rate the 
congruence between each item and the CCSS that it is designed to measure. An example of the 
“matching” approach is presented in Figure 1, and an example of how the data from such a study 
could be summarized is presented in Figure 2. An example of the rating approach is presented in 
Figure 3; an example of how the rating scale data can be summarized is presented in Figure 4. 


Regardless of the method chosen, appropriately summarizing the results of these content-based 
validity studies is important. Results should be analyzed at the item level to screen out or revise any 
items that have poor alignment ratings. More important, however, is aggregating the data so that the 
representation of the claims or assessment targets within each subject area can be evaluated. 


In addition to the descriptive summaries of alignment, these studies should also compute 
congruence/alignment statistics. Such statistical summaries range from purely descriptive to those 
that involve statistical tests. On the descriptive end, Pooham (1992) suggested a criterion of 7 of 10 
SMEs rating an item congruent with its standard to confirm the fit of an item to its standard. This 
10% criterion could be applied to the claim level and other aggregations of items. On the statistical 
end, several statistics have been proposed for evaluating item-standard congruence, such as 
Hambleton’s (1980) item-objective congruence index and Aiken’s (1980) content validity index. In 
addition, Penfield and Miller (2004) established confidence intervals for SMEs’ mean ratings of 
content congruence. 


26 


Figure 1. Sample Item/Assessment Target Rating Form for Summative Assessment: Reading (Literary) 
Assessment Target (choose one for each item) 


Text 
Item # Key Central Word Reasoning & = Analysis W/iN, structures @ Language 


Details Ideas | Meanings Evaluation across Texts Eee ce Use 


From the matching approach (Figure 1), we can see how these data can inform us about the degree 
to which the assessment targets are represented by the items in a general Sense. For example, in 
Figure 2, we see that the items associated with the assessment target “Analysis within and across 
Texts” were generally considered congruent with this target by the SMEs, but the items measuring 
“Language Use” were less congruent. Specific items could be revised or deleted to improve the 
representation of an assessment target. However, the matching approach does not give us 
information about How we//the items measure their associated achievement target. Therefore, the 
rating scale approach is preferable, even though it may take slightly longer for the SMEs to provide 
those ratings. 


Figure 2. Example Summary of Item/Assessment Target Congruence 


% of Items Classified 
Correctly by at Least 7 
SMEs 


% of Items Classified 


Assessment Target # of Items Correctly by All SMEs 


Using the rating scale approach (Figure 3), we can get an idea of how well specific items, and the 
group of items comprising a content category or other level of the test specifications, adequately 
measure the intended standard or area, with respect to the characteristics of the rating scale. For 
example, the fictitious results in Figure 4 may suggest that the content categories have good 
representation with respect to the degree to which the items are measuring the CCSS within each 


2/ 


area. However, some specific items should be flagged for review and possibly revised or deleted. A 
similar rating task could be used to evaluate how well the items are measuring the intended 
cognitive skills. A cognitive skill dimension was not noted in the current test blueprints for the 
Smarter Balanced summative assessments, and so a cognitive skill classification such as that used 
in the Webb (1999), Achieve (2006), or Porter & Smithson (2002) alignment approaches could be 
adopted and arranged as a rating task, such as those presented in Figure 1 and Figure 3. 


28 


Figure 3. Example of SME Rating Task Assessing Item/CCSS Congruence 


Directions: Please read each item and its associated benchmark. Rate how well the item measures its benchmark, using the rating scale provided. Be 
sure to circle one rating for each item. 


How well does the item measure its CCSS? 
Item Common Core State Standard (Grade 4 ELA) (circle one) Comments 
1 «SI (CO) ej d(oyar=1)) 
(Not at all) eerie Eas ey (Very well) 


2% [terse onieiardmenaonngniomestontewe | + [2] 2 [4] 8 {ef 

text says explicitly and when drawing inferences from the text 
Determine a theme of a story, drama, or poem from details in 

Describe in depth a character, setting, or event in a story or 

drama, drawing on specific details in the text (e.g., a character’s 

thoughts, words, or actions). 


Determine the meaning of words and phrases as they are used 
in a text, including those that allude to significant characters 
found in mythology (e.g., Herculean). 

Explain major differences between poems, drama, and prose, 
and refer to the structural elements of poems (e.g., verse, 
rhythm, meter) and drama (e.g., casts of characters, settings, 
descriptions, dialogue, stage directions) when writing or 
speaking about a text. 


Determine a theme of a story, drama, or poem from details in 
1614 . 1 
the text; summarize the text. 


Determine the meaning of words and phrases as they are used 
in a text, including those that allude to significant characters 
found in mythology (e.g., Herculean). 


Compare and contrast the point of view from which different 
stories are narrated, including the difference between first- and 
third-person narrations. 


1733 Refer to details and examples in a text when explaining what the 4 5 3 A 5 
text says explicitly and when drawing inferences from the text. 


29 


Figure 4. Example Summary of Results from Item/CCSS Congruence Study 


ltem Content Category Mean Median ae 
Takel=y 


1121 | Reading-Literary 


a 


Notes: Statistics based on 10 SMEs and rating scale where 1 = Not at all, 6 = Very well. *o < .05. 


Given that data from the rating approach can be aggregated and summarized for each of the 
dimensions comprising the test blueprints, we recommend this approach, which can be 
implemented by having SMEs review each item and rate the degree to which it appropriately 
measures the CCSS it is designed to measure. Based on the literature (e.g., O’Neil, Sireci, & Huff, 
2004; Penfield & Miller, 2004), we recommend that at least 10 SMEs be used for each grade and 
Subject area. This type of study will provide data that can be used to evaluate the content 
representativeness of items, sets of items that comprise an adaptive test for a student, and sets of 
items that comprise assessment targets, claims, or other levels of the test specifications. A 
contractor may propose a more general alignment study involving tasks that differ from those 
recommended here, which may be appropriate. However, the contractor should be required to 
demonstrate how the data will confirm the congruence between the sets of items that comprise an 
assessment for a student and the test specifications, as well as the degree to which the test items 
adequately represent the targeted cognitive skills. Although the adaptive nature of the summative 
assessments makes aggregating content validity results to a test “form” impossible, the 
representativeness of the most common sets of items taken by examinees, or a representative 
sample, could easily be studied (e.g., Crotts, Sireci, & Zenisky, 2012; Kaira & Sireci, 2010). 


The content validity studies should also break out the results by item format. The summative 
assessments will include traditional selected-response items, technology-enhanced items, and 
performance tasks. Ideally, all item formats should have high ratings. 


There is one drawback to the content validation/alignment methods discussed so far. By informing 
the SMEs of the CCSS measured by the items or of the assessment targets measured, they may 
exhibit a “confirmationist bias” or social desirability. That is, the SMEs may unconsciously rate the 
items more favorably than they actually perceive them, to please the researchers. One way around 
this problem is to have SMEs rate the s/mi/arityamong pairs of test items and use multidimensional 


30 


scaling to analyze their data (D'Agostino, Karpinski, & Welsh, 2011; O’Neil et al., 2004; Sireci & 
Geisinger, 1992, 1995). However, this approach is not very common because it takes more time for 
SMEs to complete and involves more complex data analysis. A description of this method appears in 
Appendix C, should concerns about confirmationist bias/social desirability in evaluating test content 
arise. 


Evaluating evidence-centered design. The evidence-centered design (ECD) underlying the 
development of the summative assessments specifies four claims and accompanying rationales in 
each subject area. These claims represent the cognitive models for each subject area. The 
assessment targets provide the evidence to support the claims, and the score reports represent the 
interpretation of the evidence. The content validity studies previously described could be extended to 
evaluate these three components of ECD in each subject area. The survey of curriculum specialists 
described earlier could include questions regarding the soundness of the claims and accompanying 
rationales in each subject area. Second, the studies involving ratings of items could be aggregated at 
the assessment target level to ensure that each target is represented by a sufficient number of items 
that are rated as measuring their intended CCSS well. 


The third aspect of ECD, interpretation, should be evaluated through studies regarding the utility and 
comprehensibility of the summative assessment score reports. Ideas for these studies are described 
later in this report, in sections regarding validity evidence based on testing consequences. The idea 
here is to discover whether users of test reports interpret them correctly (Haertel, 1999), as well as if 
there are means for improving these score reports. It is assumed that studies of this kind will be 
done via piloting of the score reports. However, studies of the utility of the score reports should 
include ascertaining whether the information in the score reports is readily interpretable with respect 
to the intended claims. 


Validity Studies Based on Internal Structure. Validity studies based on internal structure should be 
conducted to support the interpretations made on the basis of scores from the summative 
assessments. The scores reported should demonstrate adequate reliability and confirm the 
hypothesized “dimensionality” of the assessment. Studies in this area will involve analyzing the data 
from students’ responses to the items. 


Dimensionality assessment. With respect to dimensionality, it is presumed that items comprising the 
summative assessments will be calibrated using unidimensional IRT models, which are the most 
common models in contemporary educational assessment. One straightforward way to assess the 
dimensionality of tests calibrated using IRT is residual analysis (Hambleton, 1989; Hambleton & 
Rovenelli, 1986). Residual analysis compares the probability of success on an item (predicted by the 
IRT model) for students of different proficiency levels to the actual success of students of different 
proficiency levels. 


Two examples of residual analysis plots are presented in Figures 5 and 6. The small circles in each 
figure are “conditional p-values” and represent the proportion of students, within a certain test score 
interval, who correctly answered the item. That is, they are proportion-correct statistics, conditional 
on test score (actually, conditioned on the IRT estimate of true score, called theta). The vertical lines 
spreading from these conditional o-values illustrate the confidence intervals for the probability 
estimates based on the IRT model. The item displayed in Figure 5 displays good fit, in that the IRT 
model for this item essentially runs through the conditional o-values. The item displayed in Figure 6 
does not fit well, as several of the conditional p-values are far off the item characteristic curve 
specified by the IRT model. 


Inspection of residual plots is descriptive in nature, and there are statistical indices that can be used 
to flag items that do not fit the IRT model. Such analyses are important for the summative 
assessments, to make sure that the various item types used are all adequately fit by the IRT model. 
More importantly, however, summary statistics across all items can be used to evaluate the degree 
to which the IRT model fits the data for all items comprising an assessment, and hence the degree to 


ou 


which the IRT assumption of unidimensionality holds (note that a lack of fit may indicate a problem 
other than multidimensionality). All of the aforementioned analyses can be conducted using 
customized software, or the free ResidPlots2 residual analysis software developed by Liang, Han, 
and Hambleton (2008, 2009).2 The ResidPlots2 software allows users to simulate data that fit the 
IRT model, to gauge the degree to which the observed test data deviate from chance expectations, 
assuming the IRT model is true. This analysis can be useful for evaluating overall IRT model fit to the 
data. Further description of ResidPlots 2 appears in Appendix D. 


lt should be noted that most IRT software programs produce residual plots and statistical measures 
of fit, such as the chi-square statistic. If the Smarter Balanced assessments were calibrated using 
the Rasch model, the Infit and Outfit measures of item fit could also be used to evaluate IRT model 
fit (e.g., Linacre, 2004).8 


2 Available for free from the University of Massachusetts at http://www.umass.edu/remp/software/residplots/. 


3 Both Infit and Outfit summarize the residuals between a student’s observed pattern of responses to a set of 
items and the pattern predicted from the IRT model. The difference between the two measures is that the Infit 
measure weights items “closer” to a student’s proficiency (theta) score more heavily than items further from 
the student’s proficiency, whereas the Outfit statistic does not involve weighting. Each statistic represents a 
mean square error of the residuals and each has a standardized version. 


32 


Figure 5. IRT Residual Analysis Plot from ResidPlots-2 (good model fit) 


ResidPlots - Plot Dialog 


Raw Residuals 
tem 2301)! Sample Size: 3000 / a=0.98 ! b=-0.56 fc=0.20 


cL 
ak 
rm 


0.0 1.0 
Latent Trait (Theta) 


© SR Distributions °F © Standardized Residual item| 23%) [SRPOF | | FitStat_| 
©) Data-Model Fit ) @) Raw Residual | Fet| SE Save All | 
©) Score Distribution Include the Title [| Aggregate Categories (only for Poly) 


Figure 6. IRT Residual Analysis Plot from ResidPlots-2 (poor model fit) 


— ResidPlots - Plot Dialog 


Raw Residuals 
Item 5(1)/ Sample Size: 3000 / a=0.79 / b=-1.93/ c=0.00 


7 
U 


re) 
‘e) s) ° 


-1.0 0.0 1.0 
Latent Trait Theta) 


© SR Distributions °~>! Dic! © Standardized Residual ltem I : 2 SR PDF Fit Stat. 
© Data-Model Fit Sol ten @ RawResidual = Category| 1 ~| [LSRCDF][_RISE_| | Save all 
© Score Distribution Include the Title [ | Aggregate Categories (only for Poly.) 


33 


There are more comprehensive methods for assessing the dimensionality of an educational 
assessment, such as exploratory and confirmatory factor analysis and multidimensional scaling (see 
Hattie, 1985, or Sireci, 1997, for reviews of methods). Some of these methods are recommended for 
validity studies related to other Smarter Balanced purposes. For purpose 1, which is focused on 
whether the assessments are valid and reliable measures of the CCSS, evaluating dimensionality via 
residual analysis should be sufficient. An advantage of IRT residual analysis is that it can be easily 
conducted on “incomplete” data sets that result from adaptive testing—that is, the student-by-item 
data file is incomplete in that not all students respond to all items. Such nonrandom, missing data is 
difficult to analyze using standard factor analytic procedures (cf. Sireci, Rogers, Swaminathan, 
Meara, & Robin, 2000). 


Measurement precision. Purpose 1 for the summative assessments specifies reliable measures, 
which involve an analysis of the precision of the assessments. Measurement precision refers to the 
amount of error, or variation, expected in a student’s test score if the student were repeatedly 
tested. It is closely related to test score re/ability, which is an estimate of the consistency or stability 
of the score. As described by Anastasi (1988): 


Reliability refers to the consistency of scores obtained by the same persons when 
reexamined with the same test on different occasions or with different sets of 
equivalent items, or under other variable examining conditions. This concept of 
reliability underlies the computation of the error of measurement of a single score, 
whereby we can predict the range of fluctuation likely to occur in a single individual’s 
score as a result of irrelevant, chance factors. (p. 109) 


Measurement precision is a broader term than re/abi/ityand refers to both estimates of score 
reliability and other descriptions of measurement error. A great deal of statistical theory has been 
developed to provide indices of the reliability of test scores as well as measures of measurement 
error throughout the test score scale. Classical test theory defines reliability as the squared 
correlation between observed test scores and their unbiased values (“true scores”). Reliability 
indices typically range from O to 1, with values of .80 or higher signifying test scores that are likely to 
be consistent from one test administration to the next. 


Reliability indices are based on “classical” theories of testing. These estimates are reconceptualized 
in IRT, which characterizes measurement precision in terms of test information and conditional 
standard error. Therefore, the recommended measurement precision studies to Support purpose 1 
include estimates of score reliability (both coefficient aloha and stratified aloha, where relevant) and 
analysis of conditional standard errors of measurement based on IRT (e.g., test information functions 
and standard-error functions). Estimates of decision consistency, decision accuracy, and 
generalizability studies will be discussed in the sections related to other study purposes. 


34 


Validity Studies Based on Response Processes. The CCSS specify a wide range of knowledge and 
skills in each subject area. For example, two standards in high school geometry are: 


Know precise definitions of angle, circle, perpendicular line, parallel line, and line segment, 
based on the undefined notions of point, line, distance along a line, and distance around a 
circular arc. 


and 


Construct an equilateral triangle, a square, and a regular hexagon inscribed in a circle. 
(NGA Center & CCSSO, 2010, p. 76) 


The first standard represents a lower cognitive level of knowledge, while the second represents a 
higher level involving synthesis of several geometrical concepts. Evidence based on students’ 
response processes could help validate that the summative assessment items are measuring the 
lower- and higher-order cognitive skills specified in the CCSS. One relatively easy study that could be 
done is an analysis of the amount of time it takes students to respond to items of various (purported) 
cognitive complexity. Students’ response-time data should be readily available after the pilot tests, 
and the hypothesis that the items measuring higher-order skills will take more time for students to 
complete could be tested using analysis of variance (ANOVA).4 In addition, cognitive interviews or 
think-aloud studies could be conducted to best understand students’ thought processes as they 
respond to items of varying cognitive complexity (Hamilton, 1994; Leighton, 2004). 


Summative Assessment Purposes 2 and 3: 


Provide valid, reliable, and fair information about whether students prior to grade 11 have 
demonstrated sufficient academic proficiency in ELA and mathematics to be on track for 
achieving college readiness. 


and 


Provide valid, reliable, and fair information about whether grade 11 students have sufficient 
academic proficiency in ELA and mathematics to be ready to take credit-bearing college 
courses. 


These two purpose statements reflect the fact that the Smarter Balanced summative assessments 
will be used to classify students into achievement levels. Before grade 11, one achievement level will 
be used at each grade to signal whether students are “on track” to college readiness. At grade 11, 
the achievement levels will include a “college and career readiness” category. Such classification 
decisions require validation. Validity evidence for these purposes should come from four sources— 
test content, internal structure, relations with external variables, and testing consequences. In 
addition, because these classification decisions represent achievement level standards, Kane’s 
(1994) sources of validity evidence for standard setting—procedural, internal, and external—are also 
relevant. However, we note that Kane’s external evidence overlaps considerably with validity 
evidence based on relations with external variables. 


Summative assessment purposes 2 and 3 differ with respect to grade level, with the assessments 
prior to grade 11 being used to predict whether students are “on track” for college and career 
readiness, and the grade 11 assessments used for certifying certain academic aspects of college 
and career readiness. This difference involves somewhat different types of validation evidence. In 
particular, because there has been a great deal of work on assessing college readiness, there are 
more potential validation criteria for the grade 11 college readiness classification. 


4 Note that response-time data are typically highly positively skewed, and so a natural log or similar 
transformation would be needed for this analysis. 


35 


Validating “On Track” Based on Content Validity Evidence. Being on track for college readiness 
implies acquisition of knowledge, and mastery of specific skills, thought to be important as students 
progress through elementary, middle, and high school. These specific knowledge and skills are 
Stipulated in the CCSS. Therefore, the validity studies described earlier for purpose 1 are all relevant 
here. Essentially, the validity studies based on test content that were described for purpose 1 need 
to confirm that the summative assessments are targeting the correct CCSS and adequately 
represent these standards. However, such studies will not confirm that the CCSS actually contain the 
appropriate knowledge and skills to Support college and career readiness. Rather, the CCSS would 
need to be reviewed to confirm that they contain the appropriate knowledge and skills that students 
need in order to be on track for college and career readiness. 


One way to evaluate the appropriateness of the CCSS for determining whether students are on track 
for college and careers is to conduct a survey of state educators. At the postsecondary level, Conley, 
Drummond, Gonzalez, Rooseboom, and Stout (2011) conducted a national survey of postsecondary 
institutions to evaluate the degree to which the grade 11 and grade 12 CCSS contain the knowledge 
and skills associated with college readiness. They found that most (of almost 2,000) college 
professors rated these CCSS as highly important for readiness in their courses. A similar type of 
survey of educators in participating states would be helpful for evaluating the CCSS in ELA and math 
in grades 3 through 8. A major question motivating the survey would be: Are the CCSS in these 
grades appropriate for preparing students for college and careers? 


In addition to these studies, it should be noted that studies involving validity evidence based on 
relations with other variables will also require validity evidence based on test content. For example, 
when Smarter Balanced assessment scores are compared with other test scores, the similarity of 
content across the two tests will need to be assessed. 


Validating “On Track” Based on Internal Structure Evidence. 


Decision consistency and decision accuracy studies. Given that purpose 2 involves the achievement 
level classification of “on track,” in addition to the measurement precision studies described earlier 
for purpose 1 (IRT residual analysis, reliability estimates, information functions, etc.), evidence that 
the classifications assigned to students are reliable is needed. Therefore, estimates of decision 
consistency (DC) and decision accuracy (DA) are needed, as are estimates of the precision of 
measurement around the “on track” cut score (i.e., conditional error of measurement at that point). 


In essence, DC refers to the consistency of student classifications resulting either from two 
administrations of the same examination or from parallel forms of an examination. Thus, the concept 
is similar to reliability, but instead of consistency of a score, it refers to consistency of classifications 
across repeated testing. DA can be thought of as the extent to which the observed classifications of 
students agree with the students’ “true” classifications. Estimates of DA compare the classifications 
into which students are placed based on their test score with estimates of their true classifications. 
However, because students’ true proficiencies are never known, simulation studies or some type of 
Split-half estimate are typically used to estimate DA. 


There are several statistical approaches for estimating DA and DC. Livingston and Lewis (1995) 
introduced a method for estimating DC and DA based on a single administration of a test, using 
classical test theory. More recently, IRT-based methods have been proposed (Lee, 2008; Rudner, 
2001, 2004) and are more common for IRT-based tests. Free software for estimating DC and DA for 
IRT-based tests, such as the Smarter Balanced summative assessments, is available (Lee, 2008),° 
although some adjustments may need to be made for the adaptive test design. Another option would 
be the approach used by Hambleton and Han (2004), who estimated DA and DC by simulating data 


5 ans software, IRT-Class, is available for free from the University of lowa via 


based on IRT item parameter estimates, and by comparing the consistency of classification over 
simulated examinees. 


Estimating the cut-score standard error. AS Kane (1994, 2001) discussed, analysis of the expected 
amount of variability in the cut score resulting from a standard setting study should be considered in 
validating an achievement level standard. As part of the documentation for setting the “on track” 
standard and other achievement level standards on the summative assessments, estimates of cut- 
score variability should be provided. These descriptive statistics estimate the amount of change 
expected in a cut score if the study were replicated using different panelists, items, or standard 
setting methods. Sireci et al. (2009) provided examples of several different methods for evaluating 
the cut scores established on a grade 12 National Assessment of Educational Progress (NAEP) 
mathematics assessment. These methods range from simply computing the standard error of the 
mean across panelists to replicating the standard setting study using an independent standard 
setting panel. 


For the “on track” college readiness standards below grade 11, estimates of cut-score variability 
should be documented, but should also be communicated to Smarter Balanced leadership before 
the cut scores are finalized. The specific estimates to be used are somewhat dependent on the 
standard setting method. Most methods involve cut-score recommendations for each panelist, and 
so the standard error of the panelist mean can be computed. Where multiple rounds of standard 
setting are conducted in a study, the variability (e.g., standard deviation, standard error of the mean) 
across rounds can be calculated, with the expectation that variability will decrease across rounds.® 
When the panelists’ median cut score is used, standard errors for the median can be computed 
based on bootstrapping (e.g., Sireci et al., 2009) and other procedures. 


A better estimate of cut-score reliability is based on the variability across independent standard 
setting panels. Brennan (2002) showed that when there are only two independent observations, 
such as two means from two separate standard setting studies, the standard error of the mean is 


-_ LX, -X,] 
2 


where X41 and X> are the means across panelists in the two standard setting studies. For Smarter 


Balanced summative assessments that involve high-stakes standards, we recommend that 
independent standard setting studies be conducted so that the variability across recommended cut 
scores can be estimated. 


Validating “On Track” Based on Relations with External Variables. It is likely that one of the 
achievement level standards set on the ELA and Math summative assessments will be used as the 
“on track” designation in each grade level. For example, the “Proficient” standard in each grade 
might be used. Validating this specific score interpretation based on the relations of scores with 
other variables requires other measures of students’ mastery of grade-level knowledge and skills. 
Examples of external variables that could be used are teachers’ ratings of students’ preparedness 
for the next grade and other standardized assessments. Welch and Dunbar (2011), for example, 
explored the use of the lowa Tests of Basic Skills (ITBS) for determining college readiness from 
grades 5 through 11. To accomplish this task, they first explored the relationship between the ITBS 
and the ACT composite scores for students who had taken the ITBS across grades and who had 
taken the ACT. The correlations between ITBS scores and the ACT ranged from .82 to .87 from 
grades 5 through 11. Next, for grade 11, they found the ITBS score that maximized classification 


6 Although computing statistics such as the standard error of the mean is common in standard setting studies, 
when panelists discuss their ratings, the independence-of-observations assumption is violated, and so this 
estimate of variability probably underestimates the true variability across independent panelists. 


of 


congruence with the ACT college readiness benchmark score (their study involved students who took 
both assessments). Using the corresponding ITBS percentile rank scores at the lower grade levels, 
they found about an 80% accuracy rate for predicting the ACT benchmark. However, they suggested 
putting error bands around the “on track” benchmark, and if a student’s score was within the error 
band, the student could be considered on track. 


In addition to the Welch and Dunbar (2011) study, both ACT and the College Board are using 
assessments at lower grade levels to assess college readiness. ACT has readiness benchmarks on 
its EXPLORE and PLAN assessments for grades 8 and 10, and the College Board recently introduced 
the ReadiStep exam for grade 8 and has long used the PSAT in Grade 10. The ACT benchmarks for 
EXPLORE and PLAN were set by retrospective analysis of students who took EXPLORE, PLAN, and the 
ACT. 


Another study that could be conducted is to have teachers classify their students regarding whether 
each student is prepared for the knowledge and skills to be taught at the next grade level. Although 
Subsequent-grade-level preparedness is different from college readiness, it is likely that these two 
variables would be strongly related. Thus, the classification consistency between teachers’ ratings 
and students’ “on track” classifications could provide useful validity evidence. For this type of study, 
teachers would have to be familiar with the curricula taught in the subsequent grade. We also 
recommend gathering data on teachers’ confidence in the rating that they make for each student. 
Such data would be an important validity check before computing classification consistency and 
could be used to delete the data for teachers who were not confident in making their preparedness 
ratings for some or all students. 


Validating “On Track” Based on Testing Consequences. Providing “on track” and other achievement 
level classifications for students in grades 3-8 is likely to have consequences for students, teachers, 
and instruction. At the student level, one potential negative consequence is promoting low academic 
self-esteem for students who are classified as below “on track.” Such negative feelings could lead to 
“self-fulfilling prophecies” where students begin to believe that they are not smart or not capable of 
graduating high school. Student surveys and tracking dropout rates over time (Rabinowitz, 
Zimmerman, & Sherman, 2001) are two ways that this and other consequences could be measured. 
The “on track” designation could also have the intended positive consequence of early identification 
and remediation of students classified as below “on track.” Therefore, following up on the 
instructional decisions that are made for these students is another area of study that would provide 
important validity evidence. Validity evidence for this purpose based on testing consequences should 
also involve gathering data from teachers via interviews, focus groups, or surveys to assess their 
perceived utility of these classifications and how it has affected their instruction. The consistency of 
these impressions and effects on instruction across grades should be studied. 


Validating “On Track” Based on Procedural Evidence. Procedural evidence for standard setting refers 
to documentation and justification of all of the decisions and actions associated with a standard 
setting study. These decisions and actions were previously Summarized in Table 5 (pp.20-21), and 
include selection of the standard setting panelists, justification of the standard setting method, 
training of panelists and other tasks associated with successful implementation of the method, 
analyzing the data, and assessing panelists’ confidence in their ratings and the process. Justification 
of the standard setting method will be important for the Smarter Balanced assessments, as some 
methods, such as the widely used Bookmark method, have been shown to have serious deficiencies 
(Davis-Becker, Buckendahl, & Gerrow, 2011; Reckase, 2006a, 2006b). Procedural evidence must be 
comprehensively documented, and should include surveys of panelists and others involved in the 
process. Standard setting reports for NAEP, such as those by ACT (2005a, 2005b, 2005c) are 
excellent examples of comprehensive documentation of standard setting that provides procedural, 
internal, and external validity evidence. 


38 


Validating College and Career Readiness Benchmarks 


The third purpose statement for the summative assessments specifies college and career readiness. 
For the purposes of this research agenda, we assume that the knowledge and skills associated with 
college and career readiness have substantial overlap, as suggested by recent research (e.g., 
American Diploma Partnership, 2004; ACT, 2006), and so we focus on validating the college 
readiness benchmark. However, this assumption is based on convenience rather than research, 
since others have argued that the benchmarks for college and career readiness will be very different 
(Camara, in press; Loomis, 2011). Nevertheless, the methods described here for validating college 
readiness would carry over to the validation of career readiness, should appropriate external criteria 
for career readiness be identified. 


Validating “College and Career Ready” Based on Content Validity Evidence. Up to this point, we have 
twice discussed validity evidence based on test content—first for purpose 1, and second with respect 
to students being “on track” for college readiness (purpose 2). The same studies apply here for 
validating the “college and career ready” inference based on the grade 11 Summative assessments. 
This readiness designation implies acquisition of knowledge, and mastery of specific skills, 
considered necessary for success in college and careers and stipulated in the CCSS. Therefore, the 
content validity studies described earlier for purpose 1 are relevant here, and their findings should 
inform the validity argument for validating the college and career readiness standard. The additional 
evidence required for readiness is evidence that these standards are, in fact, the appropriate 
prerequisite skills in math and ELA that are needed to bypass remedial college courses and be ready 
to successfully begin postsecondary education or a career. The recent report by Conley et al. (2011) 
represents important evidence to support that assumption. Similarly, Vasavada, Carman, Hart, & 
Luisser (2010) found strong alignment between College Board assessments of college readiness 
and the CCSS. 


Other validity evidence that is based on test content and that will be used in the validity argument for 
the college and career readiness determination includes content overlap (alignment) studies that will 
be done to gauge the similarity of knowledge and skills measured across the summative 
assessments and external assessments that are used to evaluate the readiness standards. 
Postsecondary admissions tests (e.g., ACT, SAT) and college placement tests (e.g., ACCUPLACER, AP, 
Compass) will be used in concurrent and predictive validity studies, and so the overlap of skills 
measured must be documented to properly interpret the results. The National Assessment Governing 
Board (NAGB) recently began a program of research in this area to set college and career 
benchmarks on the grade 12 NAEP assessments. Its research agenda began with comprehensive 
alignment studies that evaluated the overlap of NAEP and external assessments (Loomis, 2011; 
NAGB, 2010). 


Validating “College and Career Ready” Based on Internal Structure Evidence. The previous 
descriptions of validity evidence based on internal structure for the “on track” student classification 
(i.e., estimates of DC and DA, review of the conditional standard error of measurement around the 
cut score, estimates of the standard error of the cut scores derived from the standard setting 
studies) are equally important for validating the college and career readiness classifications of 
students. These estimates and studies were described in previous sections, and so their descriptions 
are not repeated here. 


Validating “College and Career Ready” Based on Relations with Other Variables. In considering 
validating the college readiness achievement level standards on the Smarter Balanced summative 
assessments, we focus on validity evidence based on relations to external variables because, as 
Camara (in press) pointed out, “Given the intended purposes of [college and career readiness] 
assessments, if performance levels and benchmarks are inconsistent with empirical data of 
performance in college and career-training programs, they will not only lack credibility but would 
raise concerns about the validity of the interpretive argument.” 


39 


A college- and career-ready standard implies that students who meet this standard have the 
prerequisite academic knowledge and skills to Succeed in college or in a career. Given that there are 
currently existing standards for college readiness,’ the readiness classifications based on the 
Smarter Balanced summative assessments should be congruent with these other standards, 
assuming that these external standards accurately measure college readiness. The degree to which 
current college readiness benchmarks are consistent with the Smarter Balanced readiness 
standards needs to be studied. These studies could be used (a) to empirically set the Smarter 
Balanced readiness standards, (b) as part of the standard setting process, or (c) to validate the 
standards after they have been set by other means. 


Validity evidence based on relations to other variables for the purpose of classifying students as 
college ready should involve both correlation/regression studies and classification consistency 
analyses. In these analyses, scores from the summative assessments will be correlated with, used 
as predictors of, and cross-tabulated with other measures of college readiness. To conduct these 
analyses, appropriate external measures must be identified, defined, and evaluated for validation 
purposes. In addition, different research designs should be considered. Design options include: 


e Concurrent studies where students take both the summative assessments and external 
assessments; 


e Predictive studies where students take the summative assessments and their future college 
performance is compared in retrospective fashion; and 


e Embedded item designs where summative assessment items are embedded in other 
assessments of college success, and vice versa. 


Defining “college success” is not straightforward, and so we recommend that several different 
variables be used, and studied, as outcome variables for college readiness. Camara (in press) listed 
seven criteria that have been or could be used for setting or evaluating college readiness 
benchmarks on Smarter Balanced or Partnership for Assessment of Readiness for College and 
Careers (PARCC) assessments. These are: 


e Persistence to second year; 

e Graduation or completion of a degree or certification program; 

e Time to degree completion (e.g., 6 years to earn a bachelor’s degree); 
e Placement into college credit courses; 

e Exemption from remediation courses; 

e College grades in specific courses; and 

e College grade point average. 


Camara also noted that the most common criterion is college grades, either first-year grade point 
average (GPA) or grades in specific first-year courses. For example, in setting the college readiness 
benchmark on the ACT, grades in specific first-year courses were used (Allen & Sconing, 2005), but 
to set the same benchmark on the SAT, Wyatt, Kobrin, Wiley, Camara, and Proestler (2011) used 
first-year GPA. 


’ We use readiness here to refer to the academic skills in math and reading, not the more general readiness 
criteria that include non-cognitive variables such as contextual skills and academic behaviors (Conley, 2007). 


40 


Current college readiness benchmarks set on educational tests. Several studies have been used to 
evaluate or set college readiness benchmarks on tests. Examples of testing programs that have set 
or evaluated college readiness benchmarks include: 


e ACCUPLACER 

e ACT 

e Advanced Placement exams 

e COMPASS 

e Current statewide high school tests (end-of-course or graduation tests) 
e Early Assessment Program (California) 

e EXPLORE 

e International assessments (e.g., PISA, TIMSS) 


e International Baccalaureate 


e NAEP 

e PLAN 

e PSAT/NMSQT 
e ReadiStep 


A recent report by NAGB (Fields & Parsad, 2012) found that the most common assessments used by 
postsecondary institutions to evaluate entering students for remedial courses in math were the ACT, 
SAT, ACCUPLACER (Elementary Algebra and College Level Math), and COMPASS (Algebra, College 
Algebra). For reading, the most common assessments were the ACT, SAT, ACCUPLACER (Reading 
Comprehension), ASSET (Reading Skills), and COMPASS (Reading). 


Examples of some of the studies that have been done using these tests, the readiness standards 
that were set on each, and relevant citations are presented in Table 6. Camara (2012) described 
research in this area as consisting of three steps: First, determine the appropriate outcome variable 
for college success (e.g., first-year GPA). Second, determine the appropriate criterion of “success” on 
the outcome variable (e.g., 65% chance of a B-). Third, determine the appropriate probability of 
success. These steps will be important considerations in designing validity studies for the Smarter 
Balanced summative assessments. 


41 


Table 6. Current College Readiness Benchmarks 


Test Criterion Benchmark Comments/Citations 


ACT English 
.(5 probability of C 
ACT Reading and .50 probability Allen & Sconing (2005) 
ACT Math 
SAT Composite 
SAT-Quantitative oe 
| set auanttatte | 65 posonity of yt et ak. (2084 

SAT-Reading y 
SAT-Writing 


Relevant tests include Calculus AB, 
Advanced Placement ee Calculus BC, English Language & 
(AP) Composition, English Literature & 
Composition, and Statistics. 
.(5 probability of C 
COMPASS and .50 probability erty ACT (2010) 
of B 
.(5 probability of C ; 
EXPLORE and .50 probability peer ACT (2010) 
of B 
15 { aa. 


The studies reported in Table 6 primarily used regression methods to find the test score that best 
distinguished students who met or did not meet some operationally defined criterion of college 
success.® For the ACT research, the criterion used was the test score associated with a .75 
probability of earning a C or a .50 probability of earning a B in specific college courses (e.g., English 
composition, college algebra). For the SAT research, the criterion used was the test score associated 
with a .65 probability of earning an overall first-year GPA of B- (2.67). The ACT studies used linear 
regression, whereas the SAT studies used logistic regression. The SAT studies also included validity 
evidence based on external variables, specifically rigor of high school courses, AP exam scores, and 
high school GPA, to support the SAT readiness benchmarks (Wyatt et al., 2011). In addition to the 
Studies reported in Table 6, Fields and Parsad (2012) conducted a comprehensive survey of cutoff 
scores on postsecondary math and reading placement tests. The mean cutoff scores, and the 
variability in these scores across institutions, were reported. These mean cutoff scores could be used 
as validation criteria for the Smarter Balanced college readiness standards. Other readiness criteria 
include specific cutoff scores used by state university systems (e.g., California and Texas have 
readiness criteria based on the ACT, the SAT, and in-state assessments), and the International 
Baccalaureate exams (compensatory score of 24 across six assessments). 


8 Equipercentile equating could also be used, and may be preferable in some situations. 


42 


In addition to establishing college readiness benchmarks on admissions tests, research has also 
been conducted to see how these readiness benchmarks could inform setting readiness standards 
on other assessments. For example, the Texas Education Agency commissioned a series of studies 
to set and evaluate college readiness standards using the State of Texas Assessments of Academic 
Readiness (STAAR). In fact, in establishing the new STAAR tests, the Texas legislature legislated that 
“validity studies be conducted to evaluate the empirical links between student performance on the 
STAAR assessments and specific assessments measuring similar constructs, and that these links be 
used to inform the standard-setting process” (LaSalle et al., 2012, p. 2). These studies are 
particularly relevant to Smarter Balanced because the STAAR assessments involve on-target 
readiness standards below high school and certifying college readiness at the high school level. 


Rather than directly using external assessments to set readiness benchmarks on the STAAR exams, 
Texas used external data to set “landmarks,” or cut points, on the STAAR score scale that 
corresponded to important cut scores on the external assessments. Examples of external 
assessments that were used for this purpose included the previous statewide exams in Texas, a 
placement test used at the University of Texas, the ACT and SAT benchmarks, and the ACCUPLACER 
Elementary Algebra exam. For the previous statewide end-of-course tests, equipercentile linking was 
used to establish concordance tables across pairs of tests. For the readiness benchmarks 
established on the external assessments, logistic or linear regression was used to “map” the 
external benchmarks onto the STAAR score scales. Linear regression was also used to set other 
landmarks based on high school course grades (e.g., B or better) and probability of success in a 
relevant college course (e.g., C or better in college algebra). See Keng, Murphy, and Gaertner (2012) 
for a more complete description of these studies. 


Based on several studies of these external criteria, “landmarks,” or benchmarks, were established 
on the STAAR score scale, and these landmarks were used to establish “neighborhoods” within 
which it Seemed reasonable (to the policymakers who reviewed these results) to set the college 
readiness standard and other standards. The score scale annotated with the landmarks and 
neighborhoods was used to encourage standard setting panelists to set their standards within the 
neighborhoods, since the score scale range defined by each neighborhood contained the external 
readiness standards and other relevant information that would support the standard set in that 
range. Keng et al. (2012) described this process as “evidence-based standard setting” (p. 4; see 
also O’Malley, Keng, & Miles, 2012). 


A fictitious example of how external data could be used to inform the college and career readiness 
standard setting process using neighborhoods based on external data is presented in Figure 7. In 
this figure, test scores related to college readiness from two states (California and Oregon), the ACT 
and SAT readiness benchmarks, and the passing score for the GED Math test are all mapped onto 
the score scale for the grade 11 Smarter Balanced summative math assessment. The score 
corresponding to chance performance is also indicated. Using external data in this way can build 
validation criteria into the standard setting process. 


43 


Figure 7. Example of Using External Data to Establish a Reasonable Interval (Neighborhood) for Standard 
Setting 


CA EAP Math 
readiness score a 
OR Math gradyation ilies — 
ee aie ACT score of 3 
test passing score 
Readiness 


\ \ arter Balanced Math Score Scale 
\ 
Chance 
score \ 
\ 
GED Math passing SAT 
eee Readiness 


Recommended studies based on relations to external variables. The previous section described 
some options for conducting validity studies based on relations to external variables and 
Summarized some of the research that has already been done in this area. To relate current college 
readiness standards and other pertinent information to the grade 11 Smarter Balanced summative 
assessments, three types of studies are possible. The first two types of studies are concurrent 
validity studies. In the first variation, students would take both Smarter Balanced and external 
assessments at around the same point in time. For example, grade 11 students could take the 
Smarter Balanced summative assessments, or a subset of items from them (e.g., in the pilot study), 
and the SAT or ACT, at a reasonable point in time (e.g., March). Regression or equipercentile 
methods could be used to determine the Smarter Balanced scores that corresponded to the SAT or 
ACT readiness benchmarks. The second type of concurrent validity study would involve college 
students taking Smarter Balanced assessments (or subsets of items) near the end of a relevant 
course, and their final course grades could be used as the validation criterion. The Smarter Balanced 
scores that are associated with the pre-established readiness criterion (e.g., grade of B-) could be 
established via regression or equipercentile procedures, or probability tables could be set up to 
relate the Smarter Balanced scores to specific grades. The third type of study that could be 
conducted would be a retrospective study where students who took the Smarter Balanced 
assessments would be followed longitudinally to see how they perform in college (see, for example, 
D’Agostino & Bonner, 2009). 


Threats to the validity of these studies include differential motivation effects across the Smarter 
Balanced and external assessments, potentially non-representative samples of students due to the 
self-selection of external assessments, and a lack of overlap in the constructs measured by the 
Smarter Balanced and external assessments. Different grading standards and different admissions 
standards across colleges and universities, and across different types of institutions (public, private, 


AA 


two-year, four-year) also present problems. Nevertheless, these issues can be considered and 
discussed when interpreting the results. Surveys or interviews of students participating in these 
studies could help understand these students’ motivation to do well (Haertel, 1999). 


The most practical course of action to gather external data to validate the Smarter Balanced college 
readiness standards is to take advantage of tests already taken by grade 11 students, such as the 
ACT, SAT, and AP exams, and relate them to their scores on the summative assessments. 
Supplementary studies would need to evaluate the content overlap of these assessments and 
students’ motivation to do well on the Smarter Balanced assessments. Assuming sufficient content 
overlap and motivation, benchmarks can be set to inform the establishment of the college readiness 
standards on the Smarter Balanced assessments (as done in Keng et al., 2012), and longitudinal 
analysis can be done at a later point in time to evaluate the standards and possibly revise them if 
necessary. The key information to gather is the degree to which students who reached the Smarter 
Balanced readiness standards were successful in college. Camara and Quenemoen (2012) 
suggested that the decision consistency of the ready/not-ready and successful/not successful in 
college classifications should be broken down across different types of institutions. 


It is likely that data-sharing agreements that maintain student anonymity can be worked out between 
the Consortium and external examination programs, such as ACT and the College Board, and among 
state colleges and universities within the Consortium. In addition, as Camara and Quenemoen 
(2012) point out, the National Student Clearinghouse maintains enrollment records for a vast 
majority of postsecondary institutions and can be used to track retention and graduation rates that 
will be useful for evaluating the readiness standards. The percentages of students who are 
“Proficient” on the grade 12 NAEP Math and Reading assessments will also be evaluated with 
respect to the percentages of students who are classified as “college ready” on the respective 
Smarter Balanced assessments. Should the NAEP grade 12 results ever be reported at the state 
level, within-state NAEP/Smarter Balanced comparisons would be informative. 


Validating “College and Career Ready” Based on Testing Consequences. The college and career 
readiness standard on the Smarter Balanced summative assessments is intentionally integrated 
with the “on track” standards set at the lower grade levels. The intended consequence of this system 
is better preparation of students so that they are prepared for college or careers by the time they 
graduate high school. This intended consequence can be measured by analyzing trends in college 
completion and remedial course enrollments over time, and by surveying secondary and 
postsecondary educators about students’ proficiencies. However, validity evidence for the college 
and career readiness designation should also investigate unintended consequences, such as 
unanticipated changes in instruction, diminished morale among teachers and students, and 
increased pressure on students that may lead to dropout, or to pursuing college majors and careers 
that are less challenging. To evaluate these potential consequences, teacher surveys of enacted 
curriculum, student surveys of career aspirations, and psychological assessments of anxiety and 
academic self-concept could be conducted. 


The recommended studies based on testing consequences that will target the college and career 
readiness purposes should include teacher surveys regarding changes in student achievement and 
preparedness over time and changes in teachers’ instruction over time. We also recommend that 
students be surveyed regarding college and career aspirations. Student and teacher samples that 
are representative at the state level would suffice for these studies. If time and resources permit, 
assessing the anxiety levels of students regarding their likelihood of obtaining college or career 
readiness, and their academic self-concept, would also be helpful. Validity evidence based on the 
consequences of the college and career readiness standard should also involve analysis of 
secondary and postsecondary enrollment and persistence, changes in course-taking patterns over 
time, and teacher retention for teachers in math and ELA. 


45 


Summative Assessment Purpose 4: 


Provide valid, reliable, and fair information about students’ annual progress toward college 
and career readiness in ELA and mathematics. 


As indicated in Table 1, validity evidence to support the use of the summative assessments for 
providing information about students’ annual progress should be based on test content, internal 
structure, relations with external variables, and testing consequences. Studies related to test content 
need to evaluate the degree to which similar standards are measured across grades and the 
consistency of the construct across grades. Studies based on internal structure should evaluate the 
validity of the vertical scale used to measure progress over time. Studies involving relations with 
external variables are needed to confirm that the progress observed on the Smarter Balanced scale 
is mirrored by other measures of academic achievement. Finally, studies based on testing 
consequences should confirm that the measures of annual progress have a positive effect on 
instruction and student learning. 


The most straightforward way to measure changes in students’ proficiencies over time is to have 
scores from assessments at different points in time on a common scale. The physical analogy is the 
bathroom scale that remains unchanged across different measurements of weights. Sometimes, 
however, even the bathroom scale needs to be recalibrated to confirm the zero point. With 
educational assessments, it is difficult to put scores from assessments at different time periods on 
the same scale, because the items administered to students at different points in time are not the 
same. At this juncture, the Smarter Balanced summative assessments are planned to be vertically 
equated across grades, which means that a single score scale will span the grades. A vertical scale 
facilitates measuring changes in students’ performance over time (Briggs, 2012; Kolen, 2011; Patz, 
2007). However, it is difficult to create a valid vertical scale. Challenges to vertical scaling include 
changes in the construct of math or ELA across grades, and differences in when material is taught 
across grades and schools (Tong & Kolen, 2007). Therefore, validity evidence to Support measuring 
students’ progress toward college and career readiness should involve evaluation of the vertical 
scale across grades. 


Validity Studies Based on Test Content. Evaluations of the content measured across grades will be 
an important source of evidence for validating the appropriateness of the vertical scale for 
measuring students’ progress. First, this evaluation should assess whether there is overlap among 
the CCSS measured across adjacent grades (Patz, 2007). Next, the evaluation should review the 
common items that are used to form the vertical links across grades. SMEs should be asked whether 
the linking items are relevant to students in both grades and if they adequately represent the 
expected learning progressions. The content review should also assess the degree to which a 
common construct can be considered to hold across grades, or at least across adjacent grades. For 
example, do the anchor items that are used across grades measure CCSS that are appropriate for 
each grade? 


Validity Studies Based on Internal Structure. Most of the studies that should be conducted to 
evaluate the validity of the vertical scales underlying the summative assessments can be categorized 
as evidence of internal structure. These studies include dimensionality analyses and evaluation of 
item statistics, mean scores, and score distributions across grades. 


Dimensionality analyses. One important area of study is evaluation of the dimensionality of the 
assessment data, and of the degree to which the dimensionality is consistent across grades, or at 
least across adjacent grades. For example, if a single dimension is hypothesized to exist across 
grades, the degree to which the data for each grade are unidimensional, and the degree to which the 
same dimension holds across grades, should be studied. One way to conduct this analysis is using 
IRT residual analysis, as suggested earlier. The added layer of analysis would be evaluating the 
consistency of the fit across grades. Kolen (2011) noted that “even if the unidimensionality 
assumption does not strictly hold, the IRT model might provide an adequate enough summary of the 


46 


data that the vertical scale is still useful” (p. 12). Other dimensionality assessment procedures, such 
as confirmatory factor analysis or bifactor analysis, could also be useful. 


The incomplete student-by-item data matrix that results from adaptive testing can cause problems 
for many dimensionality assessment procedures, such as exploratory and confirmatory factor 
analysis. Thus, assessing the dimensionality within and across grades within an IRT framework is 
probably most practical. In addition to residual analysis, both unidimensional and multidimensional 
IRT models can be fit to the data, and the difference between models can be tested for significant 
and practical improvement in fit to the data (Bock, Gibbons, & Muraki, 1988; Sireci, 1997). For items 
that are dichotomously scored, this analysis can be conducted using the TESTFACT software (Wilson, 
Wood, & Gibbons, 1991). To assess multidimensionality using both dichotomous and polytomous 
items, some specialized software may be needed. 


Analysis of statistics across grades. The establishment of a vertical scale implies an increase in the 
difficulty of the assessments as grade increases and higher proficiency of students in higher grades 
relative to lower grades. At the item level, it is assumed that students at a higher grade level will 
have a higher probability of correctly answering an item than students at a lower grade level. These 
assumptions can be checked to evaluate the validity of the scale. Factors such as when students are 
taught specific knowledge and skills (i.e., opportunity to learn) and difference in time between 
instruction and assessment can cause “reversals” where students at higher grade levels perform 
worse than students at lower grade levels. Such reversals can be a problem when a common item 
approach is used to link the assessments across grade levels. Therefore, an additional study is a 
comparison of where the items “land” on the vertical scale versus the grade levels for which they 
were written. For example, if items written for a grade 6 assessment have IRT difficulty estimates 
that put them in the general range of grade 5 or grade / items, there will be a disconnect between 
the intended content at each grade level and the actual scale properties. 


Kolen (2011) and Patz (2007) suggested several analyses that could be used to evaluate the validity 
of a vertical scale (see also Kolen & Brennan, 2004). These analyses include: 


e correlation of item difficulties across grade levels 

e aprogression in test difficulty of test characteristic curves across grades 

e analysis of item difficulties across grades 

e comparison of mean scores across grades 

e comparison of scale scores associated with proficiency levels across grades 
e comparison of overlap of proficiency distributions across grades 

e comparison of variability in test scores within and across grades 


Validity evidence for vertical scales that are appropriate for measuring students’ annual progress 
would include a lack of reversals of item difficulties across grades, anticipated separation of means 
and proficiency distributions across grades, and sensible patterns of variability within and across 
grades. With respect to comparison of score means across grades, Patz (2007) suggested, “For 
sufficiently large and diverse samples of students, scale score means would be expected to increase 
with grade level, and the pattern of increase would be expected to be somewhat regular and not 
erratic” (pp. 17-18). 


With respect to evaluating patterns of variability, Kolen (2011) noted: 


Within grade variability indices typically are either similar across grades or increase 
as grade increases. Either of these patterns seems reasonable. Sometimes within 
grade variability indices decrease substantially as grade increases, which is 
sometimes referred to as scale shrinkage. Scale shrinkage can be indicative of 


47 


problems with IRT parameter estimation, in which case the vertical scaling 
procedures might need to be adjusted or the scale abandoned. (p. 12) 


In considering establishing a vertical scale for PARCC, Kolen noted, “PARCC might decide, based on 
the construct being assessed, that an acceptable vertical scale should display increasing mean 
scores from year to year, that the amount of growth is decelerating, and that the within grade 
variability is either approximately equal across grades or is increasing from grade to grade” (p. 21). 
These evaluation criteria are applicable to evaluation of the Smarter Balanced vertical scale for the 
summative assessments. 


In addition to analyses of item statistics and test scores across grades, Briggs (in press) claims that 
vertical scales should be validated by demonstrating that they possess interval scale level 
properties. This idea is new and has not seen wide application, but Briggs suggests the use of 
additive conjoint measurement to determine whether vertical scales have equal-interval properties, 
which he considers necessary for valid measurement of students’ annual progress 


In addition to the previously mentioned studies, analyses of /tem parameter drift over time should 
also be conducted. These analyses involve recalibrating IRT item parameters in subsequent years 
and comparing them to their estimates in prior years. Such analyses could improve the anchors used 
in equating across years by eliminating anomalous items, or could identify items that have been 
compromised (i.e., security problems). 


Validity Studies Based on Relations to Other Variables. To confirm that the summative assessments 
provide valid information about students’ annual progress in math and ELA, it would be good to 
compare students’ progress on these assessments with other measures of their achievement over 
the same time period. At a macro level, the aggregated progress of students over time could be 
compared to changes of students within a state on the NAEP math and reading assessments. On an 
individual student level, progress on the Smarter Balanced assessments could be compared to other 
standardized assessments that are on a vertical scale, such as the ITBS or the Measures of 
Academic Proficiency (Northwest Evaluation Association, 2005). 


In addition to concurrent validity evidence based on other tests, the degree to which the summative 
assessments are sensitive to instruction could also be studied to evaluate the degree to which the 
tests measure students’ annual progress. Teachers who more fully implement the CCSS into their 
instruction should have students who make greater progress on the summative assessments. 
D’Agostino, Welsh, and Corson (2007), for example, measured the degree to which teachers 
emphasized state academic standards in their teaching and compared these measures to students’ 
performance on the statewide test. They found a modest but positive relationship. A similar strategy 
could be implemented to evaluate the patterns of progress noted across classes on the summative 
assessments. Another way in which external data can inform the validation of the summative 
assessments as a progress measure is to have teachers rate the math and ELA progress made by 
their students within a year, and compare it to their progress as measured by the Smarter Balanced 
score scales. 


Validity Studies Based on Testing Consequences. The summative assessments are supposed to 
provide information regarding students’ annual progress so that their progress toward college and 
career readiness can be ascertained. If adequate progress is not found, it is likely that instructional 
changes will be made to support improved progress. Thus, validity evidence based on testing 
consequences should include surveys or interviews of teachers to understand the degree to which 
they find estimates of students’ progress helpful for targeting instruction to individual students and 
to their classes in general. In addition, if progress measures are used to alter the instruction for a 
student—for example, placing the student in supplementary instruction or an after-school program— 
the degree to which these actions are associated with improved progress should be studied 
(Shepard, 1993). 


48 


Another important study of testing consequences related to measuring progress is the degree to 
which progress is similar across subgroups of students. If students from different ethnic 
backgrounds, socioeconomic statuses (SES), or disability statuses are progressing at different rates, 
the reasons for such differential progress should be studied. It may be that students who initially 
perform low on the assessments have more opportunity to exhibit progress. In any case, patterns of 
progress across subgroups should be studied to ascertain whether these patterns are expected 
given the student characteristics, or if they reflect some insensitivity of the assessments to properly 
capture progress or some type of deficiency in the scale properties. 


Summative Assessment Purpose 5: 


Provide valid, reliable, and fair information about how instruction can be improved at the 
classroom, school, district, and state levels. 


As indicated in Table 1, for the Summative Assessments to provide information that will improve 
instruction, the content of the assessment must adequately measure the intended CCSS, and 
teachers, administrators, and other educators must appropriately act upon this information to tailor 
instruction accordingly. The validity studies based on test content that were described earlier for 
purposes 1 through 4, and the studies of testing consequences that were described for purposes 2 
through 4, would all provide evidence regarding the degree to which the assessment results are 
instructionally relevant. The gathering of additional validity evidence to support purpose 5 will be 
Similar to the studies suggested later in this report for the interim assessments and formative 
assessment resources, because these components are designed to work together to improve 
instruction. Many of these studies fall under the category of validity evidence based on testing 
consequences; one study based on relations to other variables, which was already mentioned with 
respect to purpose 4 (a study of sensitivity of the summative assessments to instruction), is also 
relevant to purpose 5. 


As noted earlier, teachers who more fully implement the CCSS into their instruction should have 
students who make greater progress on the summative assessments (D’Agostino et al., 2007). 


Validity Studies Based on Testing Consequences. The provision of summative assessment 
information to improve instruction will most likely come from the score reports associated with these 
assessments. Therefore, the evaluation of testing consequences relative to this purpose will focus 
largely on the utility of these score reports. An analysis of classroom artifacts will also provide 
important evidence, as will the types of Surveys, interviews, and focus groups associated with the 
studies mentioned earlier for purposes 1 through 4. 


Studies on effectiveness of summative assessment score reports. According to the score reporting 
RFP (RFP-15), Smarter Balanced has planned a wide and comprehensive variety of score reports to 
Support purpose 5. There will be both static score reports and dynamic score reports that are 
interactive. Summative assessment results will be reported at the total score and claim levels for 
both subject areas, and reports will be available for both individual students and aggregate groups. 
The comprehensive nature of these reports, and their online access and variety, should provide 
actionable data to improve instruction at the classroom, school, district, and state levels. Research 
studies should be conducted to confirm that these intended consequences are occurring. 


RFP-15 requires gathering feedback from potential users as score reports are being developed. 
Documentation regarding these reports should be reviewed to see what changes were made on the 
basis of this feedback. In addition, once the reports are operational, studies should be conducted to 
ascertain how well teachers, administrators, parents, students, and other stakeholders (e.g., 
legislators, journalists) understand the reports and find them useful. These studies should include 
Surveys, focus groups, and interviews. In addition to gathering stakeholders’ impressions of the 
reports, their understanding of the information contained in the reports should be tested (Wainer, 
Hambleton, & Meara, 1999). The actions that teachers take based on the score reports should also 


49 


be documented and evaluated for appropriateness (Bennett, 2010). In addition to assessing users’ 
understanding and use of the reports, surveys should also be used to inquire about ease of 
navigating the system, timeliness of data, and additional features that users would like to see. 


Analyses of usage statistics should also be conducted to determine the most popular reports and to 
confirm that all reports created are being used. The different types of reports that users create 
should also be reviewed. The most commonly used and least commonly used reports could be 
targeted for discussion in focus groups to (a) ensure that users are making appropriate inferences 
from the reports, (b) ensure that taking appropriate actions based on the reports, and (c) discover 
how the least-accessed reports could be improved to make them more useful, or to make users 
aware of them. 


To maximize utility of the reports, users or “data coaches” should be trained on how to access them 
and use them. In fact, the Peer Review Guidance (U.S. Department of Education, 2009b) stated that 
“Training on interpretation of results is required [and] must provide evidence on how educators can 
interpret results and then use them for proper decision making” (p. 69). Thus, the effectiveness of 
the training should also be evaluated. 


Studies of textbooks and classroom artifacts. Another way in which the effects of the summative 
assessments on instruction can be evaluated is by looking at changes in textbooks and instructional 
practices before, during, and after implementation of the assessments. In addition to the surveys 
and interviews previously discussed, classroom artifacts such as lesson plans, student handouts, 
classroom assessments, homework, syllabi, and teacher logs (e.g., Silk, Silver, Amerian, Nishimura, 
& Boscardin, 2009; Tomlinson & Fortenberry, 2008) should be studied. 


Summative Assessment Purpose 6: 


Provide valid, reliable, and fair information about students’ ELA and mathematics 
proficiencies for federal accountability purposes and potentially for state and local 
accountability systems. 


Results from the summative assessments will include scale scores at the total score and claim 
levels, and achievement level classifications in each subject area. The achievement level results 
could be used as they are currently employed in statewide testing programs for federal accountability 
purposes under NCLB. In addition, students’ progress over time could be used in growth models for 
other accountability purposes, some of which may be for federal accountability and some at the 
State or local levels. The Smarter Balanced principle of “responsible flexibility” (Smarter Balanced, 
2010, p. 5) is consistent with the idea of providing valid, reliable, and fair information that can be 
used for federal accountability in uniform fashion across all participating states, but also allows for 
states to use information from the summative assessments in their statewide and local 
accountability systems. 


Smarter Balanced cannot assume the responsibility for validating all of the potential uses of the 
summative assessments at the state and local levels, but the responsibility for validating 
accountability at the federal level should be included in the research agenda. In particular, the 
metric of “percent proficient” at the total student population level and at the subgroup level should 
be validated, as well as any other aggregate statistics used for federal accountability. 


Percent proficient is currently a primary accountability criterion in NCLB, which also requires states 
to set at least three proficiency levels. In considering the reporting of achievement level results in 
California, a technical advisory committee led by Lee Cronbach (Select Committee, 1994/1995) 
recommended that (a) the percent above cut points be reported, rather than percents at proficiency 
levels; (0) only one percent above cut points, or two at most, rather than percent above cut points for 
all proficiency levels, be reported; and (c) standard errors for percent above cut points be reported 
(Yen, 1997). The first two recommendations were suggested to reduce confusion in reporting scores 


50 


to the public. The third recommendation is standard practice in reporting scores for accountability or 
other purposes. 


The provision of valid, reliable, and fair information has been covered in the previous purpose 
statements, through the various studies involving test content, internal structure, relations to other 
variables, response processes, and testing consequences. The additional studies needed to validate 
the accountability uses of Smarter Balanced summative assessment scores are studies involving the 
reliability and validity of aggregate scores used for accountability. Of particular importance is the 
reliability of aggregate scores. 


Studies Evaluating the Reliability/Precision of Aggregate Scores. Individual schools will be one 
aggregate level of analysis in federal accountability, and so the reliability or error associated with 
school-level results will need to be estimated as part of the validity research agenda. If accountability 
results will be reported at more micro levels, such as classrooms, the reliability or error associated 
with those results would need to be estimated as well. The goals of the measurement precision 
studies to be done here are to provide an estimate of the error inherent in any aggregate scores that 
are reported for the summative assessments and to judge the utility of the information given the 
estimates of error. It is possible that these studies will Support the use of the summative assessment 
data for accountability purposes at some levels (e.g., districts) but not others (e.g., schools), because 
of the increased sampling error associated with smaller numbers of students. 


Several methods have been proposed to estimate the reliability, or standard errors, associated with 
aggregate scores from statewide assessments. Yen (1997) used generalizability theory (G-theory) to 
estimate the reliability of school-level results for percent-above-cut statistics associated with the 
Maryland State Performance Assessment program and evaluated a criterion of achieving a standard 
error, of these percents, of 2.5% or less. She concluded that was an unrealistic criterion for 
performance assessments in a single Subject area, but could be reached when evaluating a 
composite across subject areas. Her study illustrated the utility of G-theory for estimating standard 
errors for aggregate statistics, regardless of the item formats that are used. 


Hill and DePascale (2003) asserted that the reliability of decisions at the school level should be 
evaluated from a decision consistency perspective. That is, if the assessment were repeated, would 
a school receive the same (AYP) classification? Hill and DePascale (2002) listed four methods for 
estimating school classification consistency. The first, “direct computation,” is based on errors 
associated with each single classification and “uses areas under the normal curve to determine the 
probability of a correct classification” (p. 4). The second method is based on randomly dividing the 
students in a school into two groups and calculating the accountability statistics on each half. The 
third method involves randomly selecting (with replacement) multiple samples from a school, and 
the fourth method involves Monte Carlo simulation, where the parameters for a school are estimated 
and then random draws of students are made. In all four methods, the consistencies in schools’ 
classifications are evaluated. Hill and DePascale recommend using at least two methods to offset 
the disadvantage of any single method. 


Regardless of the method used to estimate the reliability of or error associated with aggregate 
summative assessment statistics used for accountability, it is important that the estimates address 
both measurement error and sampling error (Hill & DePascale, 2003; Linn, Baker, & Betebenner, 
2002), as do the aforementioned approaches by Yen (1997) and Hill and DePascale (2002, 2003). 


Simulation or empirical studies should also be conducted to evaluate the impact of factors outside of 
a school’s control (or outside of the control of whatever the unit of inference is, such as a teacher) on 
the accountability results. For example, the inference made about a district or a school should not be 
Statistically biased based on the number of students, the number of subgroups of students, or other 
factors beyond instruction. By estimating and using standard errors associated with aggregate 
scores when making accountability decisions, the validity of those decisions will be enhanced. 


one 


Simulation and other studies could also be used to inform accountability decisions such as how 
many years of data should be used to evaluate a district, school, or other unit of interest. 


The degree to which derivative measures of Summative assessment scores, such as “growth” 
measures, will be used in accountability systems is not Known at the time of this writing. Any 
derivative measures would need to demonstrate evidence of reliability and validity. The Standards 
made this point when discussing what today might be considered a “growth” score: “When change or 
gain scores are used, the definition of such scores should be made explicit, and their technical 
qualities should be reported” (AERA et al., 1999, p. 167). Unfortunately, many of the current score 
derivatives, such as growth percentiles and value-added scores for teachers, have not been widely 
studied. AS Brennan (2011) lamented, “to the best of my knowledge the subject of error variances 
and measures of precision for measures of growth is largely uncharted territory” (pp. 16-17). 


Validity Studies Based on Relations to Other Variables. The use of summative assessment results for 
federal accountability purposes will certainly involve the use of achievement level results. In addition 
to the reliability studies previously mentioned, the previously mentioned studies supporting the use 
of achievement level standards are also relevant. However, additional studies are needed to support 
the utility of aggregate results based on achievement level results. For example, are the schools that 
are identified as not making adequate progress, based on percentages of “Proficient” or “on track” 
students, really the schools that should be flagged? Studies that could be designed to answer this 
question include using other measures of student achievement to classify schools into performance 
categories, and single-case studies where schools identified as over- or underperforming are 
carefully reviewed to evaluate the classification. 


With respect to other measures of student achievement, at the high school level, changes in 
summative assessment scores for a school could be compared with the school’s changes in scores 
on AP and college admissions tests. Perhaps student fees for these admissions tests could be paid 
for to remove the self-selection problem. At the middle school level, ACT’s and the College Board’s 
assessments for younger students (EXPLORE, PLAN, ReadiStep) could be used. 


Validity Studies Based on Testing Consequences. The use of test scores for accountability has been 
accused of causing many problems, such as decreased teacher morale, increased pressure on 
students, and narrowing of the curriculum. As described earlier for purposes 1 through 3, these 
criticisms could be studied using comprehensive surveys of students and teachers, both before and 
after the implementation of the summative assessments. Surveys could be used to understand the 
effects on students (e.g., anxiety, educational aspirations), teachers (morale, retention, movement 
into non-tested subject areas, instruction), administrators (e.g., teacher recruitment and retention, 
effectiveness of school improvement), and parents (e.g., observations of their child, school choice). 
Teacher retention rates and teachers’ movement into non-tested subject areas should also be 
tracked and studied. 


Summative Assessment Purpose 7: 


Provide valid, reliable, and fair information about students’ achievement in ELA and 
mathematics that is equitable for a// students and subgroups of students. 


There are several features of the Smarter Balanced summative assessments that support equitable 
assessment across all groups of students. For example, the assessments are developed using the 
principles of universal test design; test accommodations are provided for students with disabilities; 
and Spanish-language versions of the math assessments will be developed. In addition, there is a 
Specific work group for accessibility and accommodations, and the Consortium has developed seven 
sets of guidelines to facilitate accessibility of the assessments. These include general accessibility 
guidelines for item writing and reviewing (Measured Progress & ETS, 2012) and guidelines for 
creating audio, sign language, and tactile versions of the items. The Consortium also developed 
guidelines for item development that aim toward reducing construct-irrelevant language complexities 


52 


for English language learners (Young, Pitoniak, King, & Ayad, 2012), and comprehensive guidelines 
for bias and sensitivity (ETS, 2012b). These documents underscore the Consortium’s commitment to 
fair and equitable assessment for all students, regardless of their sex, cultural heritage, disability 
status, native language, or other characteristics. 


Irrespective of these proactive activities designed to promote equitable assessments, studies must 
be done to provide validity evidence that the assessments are fair for all groups of students. Many of 
the equity issues are delineated in the most recent version of the NCLB Peer Review Guidance (U.S. 
Department of Education, 2009b). For example, these guidelines recommend providing translations 
in appropriate languages and formats (p. 66), and they require statistical evidence of comparability 
across different language versions of assessments (p. 36). These guidelines also require that all 
students be included in the assessment, regardless of disability or English language proficiency 
Status. 


Of these requirements, statistical evidence of comparability across the English- and Spanish- 
language versions of the math assessments, and across standard and accommodated test 
administrations, is particularly important. For example, the Standards assert, “When multiple 
language versions of a test are intended to be comparable, test developers should report evidence 
of test comparability” (AERA et al., 1999, p. 99). Similarly, the ITC’s Guidelines on Test Adaptation 
(Hambleton, 2005) state that “Test developers/publishers should apply appropriate statistical 
techniques to (a) establish the equivalence of the language versions of the test, and (b) identify 
problematic components or aspects of the test that may be inadequate in one or more of the 
intended populations” (p. 22). Thus, empirical analyses to evaluate the comparability of the English- 
and Spanish-language versions of the math summative assessments are needed. Similar evidence 
will be needed to evaluate the comparability of standard and accommodated tests. 


To evaluate the degree to which the summative assessments are fulfilling the purpose of providing 
valid, reliable, and fair information that is equitable for all students, several studies are 
recommended. These studies are categorized here as validity evidence based on all five sources of 
evidence listed in the Standards. 


Validity Studies Based on Test Content. Validity studies based on test content to support the 
equitability of the assessments will be based on the degree to which the planned universal test 
design, guidelines for assessing English language learners, and other fairness guidelines are 
implemented and followed. Documents regarding sensitivity review, and how items that were flagged 
for DIF were handled, should be reviewed. The test development processes and scoring processes 
are designed to minimize sources of construct-irrelevant variance that would inhibit fairness. The 
degree to which these procedures are followed and documented should be audited. Part of this audit 
should ascertain the degree to which students with disabilities, underrepresented minorities, and 
English language learners were included in the field tests, and the degree to which their special 
characteristics were addressed in scoring. 


Validity Studies Based on Internal Structure. When evaluating the comparability of different 
variations of a test, such as different language versions of an assessment or accommodated test 
administrations, validity studies based on internal structure are most common (Sireci, Han, & Wells, 
2008). These studies most often involve multi-group confirmatory factor analysis (CFA) (e.g., Ercikan 
& Koh, 2005). Weighted (multi-group) multidimensional scaling (MDS) has also been used for this 
purpose (e.g., Robin, Sireci, & Hambleton, 2003; Sireci & Wells, 2010). Both CFA and MDS involve 
simultaneous analysis of the dimensions underlying an assessment, and are used to assess whether 
the dimensionality is invariant across different versions of an exam. The CFA approach allows for 
Statistical tests of different levels of invariance (number of dimensions, item factor loadings, 
correlations among factors, errors associated with factor loadings). The MDS approach does not 
typically involve statistical tests of invariance, but because it is exploratory, the dimensionality does 
not need to be modeled a priori. 


52 


Multi-group analyses of dimensionality can also be used to evaluate the comparability of scores for 
different subgroups of students who take the same test. For example, Day and Rounds (1998) used 
weighted MDS to look at structural invariance of an assessment across ethnic groups, and Marsh, 
Martin, and Jackson (2010) used multi-group CFA for this same purpose. The validity research 
agenda should use multi-group CFA or MDS to evaluate the invariance of test structure across 
diverse groups of students taking the standard versions of the summative assessments, as well as 
across students taking the standard and accommodated versions of the assessments. 


In addition to comparing the dimensionality of the summative assessments across diverse groups of 
students, simpler analyses based on internal structure should also be performed. Essentially, these 
analyses involve breaking down the results of all studies of measurement precision to the subgroup 
level. Reliability estimates, conditional standard error functions, DC and DA estimates, and average 
standard errors should be reported for all Subgroups and all different versions of the assessments. 
Given that reliability estimates are influenced by variability in students’ responses, comparisons of 
measurement precision are better if based on estimates of the standard error of measurement. 


One other important source of validity evidence to support equitable assessment for all is analysis of 
DIF across test variations and across subgroups of students. There are numerous procedures for 
evaluating items for DIF, and because excellent descriptions of these procedures exist (e.g., Clauser 
& Mazor, 1998; Holland & Wainer, 1993), they are not described here. DIF studies conducted for the 
Summative assessments should include an effect size criterion to distinguish statistically significant 
DIF from substantively meaningful DIF (i.e., reflect construct-irrelevant variance). The presence of DIF 
does not necessarily indicate bias, and so DIF studies must be followed up by qualitative analysis to 
try to interpret the source of DIF. Finally, the DIF studies should evaluate the aggregate effect of DIF 
at the total test score level, or at least estimate how the presence of some DIF items may affect the 
typical test taker from a Subgroup. 


Validity Studies Based on Response Processes. The studies involving validity evidence based on 
response processes for purpose 1 are relevant here in that relevant subgroups of students should be 
included in those studies and the results should be broken down by subgroup. In particular, the 
amount of time that different groups of students take to respond to items, both with and without 
accommodations, should be studied. Any cognitive interviews or think-aloud protocols that are 
conducted to evaluate the skills measured by items should be inclusive in recruiting students. In 
addition, specific studies to evaluate accommodations for English language learners or students with 
disabilities should be conducted to determine whether the students are using the accommodations 
and find them helpful (e.g., Duncan et al., 2005). 


Validity Studies Based on Relations to Other Variables. Two types of studies based on relations to 
other variables are relevant for validating that the summative assessments are equitable for all 
subgroups of students. The first are differential predictive validity studies that evaluate the 
consistency of the degree to which the assessments predict external criteria across subgroups of 
students. Zwick and Schlemer (2004) provide an excellent example of this type of analysis with 
respect to the differential predictive validity of the SAT across native English speakers and non-native 
English speakers. These studies will be particularly relevant for the “on track” and “college and 
career readiness” standards associated with the summative assessments. Of course, the caveats 
that were mentioned earlier regarding the validity of the external criteria apply here. 


The second type of study involves a grouping variable as the external variable. Experimental studies 
that have looked at test accommodations fall into this category. For example, in some studies, 
students with and without disabilities are randomly assigned to test accommodation or standard test 
administration conditions. The validity hypothesis investigated is one of “differential boost,” which 
states that students with disabilities will have larger score differences across the accommodated 
and standard conditions than students without disabilities, and that their scores will be higher in the 
accommodated condition (Fuchs, Fuchs, Eaton, Hamlett, & Karns, 2000). 


54 


Non-experimental studies using grouping variables could also be conducted using an expected 
hypothesis of no difference across groups. For example, using changes in students’ scale scores 
over time as the dependent variable, comparisons could be made across students of different ethnic 
groups, SES, sexes, and other demographic characteristics. 


In addition to the studies previously described in this section, all other studies conducted on the 
general population could be broken down by subgroup to evaluate consistency of the results across 
Subgroups, where sample sizes permit. For example, if multitrait-multimethod studies are conducted, 
a study of the invariance of results across subgroups may prove interesting. 


Validity Studies Based on Testing Consequences. The analysis of the results from the summative 
assessments across subgroups of students will be a good starting point for understanding if there 
are differential consequences for certain types of students. In describing validity studies based on 
testing consequences for other purposes of the summative assessments, we discussed investigating 
the effects on instruction, teacher morale, and students’ emotions and behaviors (e.g., dropout, 
course-taking patterns). These results should also be broken out by subgroup, but more importantly, 
the changes in instructional decisions for students should be investigated at the subgroup level. 
Important analysis questions include: Are minority students dropping out of school at higher rates 
than non-minorities? Are the success rates for remedial programs higher for certain types of 
students? 


55 


VI. Validity Agenda for Interim Assessments 


The Smarter Balanced interim assessments differ from the summative assessments in that they are 
optional, include both secure and non-secure components, are customizable across users, can be 
administered multiple times within a school year, and are designed to provide information at a finer 
level of detail with respect to students’ strengths and weaknesses in relation to the CCSS. The 
validity studies described for the summative assessments are essentially all relevant to the interim 
assessments, but additional validation work needs to address the degree to which the interim 
assessments provide the intended diagnostic information and are useful to teachers, administrators, 
and other educators for improving instruction and student learning. 


As indicated in Chapter Ill, four purpose statements for validation are associated with the interim 
assessments. The proposed studies to support the validity of these statements are described in this 
section. 


Interim Assessment Purpose 1: 


Provide valid, reliable, and fair information about students’ progress toward mastery of the 
skills measured in ELA and mathematics by the summative assessments. 


To support this purpose, validity evidence should confirm that the knowledge and skills being 
measured by the interim assessments cover the knowledge and skills measured on the summative 
assessments and that the interim assessment scores are on the same scale as those from the 
Summative assessments. As indicated in Table 2 (p. 15), the studies providing this evidence will 
primarily be based on test content, internal structure, and response processes. 


Validity Studies Based on Test Content. The content validity studies described for the summative 
assessments will gather data relevant to the interim assessments. However, an additional level of 
analysis will be required to support the validity of reporting students’ performance at the content 
cluster levels. The sample results of a summary of a content validity study that were reported in 
Figure 4 (p. 30)suggest how results could be summarized for the content clusters targeted by the 
interim assessments. Moreover, the data from such studies could be used to select the best items 
for interim assessment purposes. That is, items that are rated as measuring their intended CCSS 
“very well” could be selected for the interim assessment item bank. 


The interim assessments are intended to help teachers focus assessment on the most relevant 
aspects of their instruction at a particular point in time. Thus, the interim assessments should better 
align with teachers’ instruction, if the content clusters are appropriately selected. To evaluate this 
intended benefit of the interim assessments, Surveys could be given to teachers regarding the 
instructional objectives that they cover at several points during the school year (i.e., scope and 
sequence survey). Then, the content clusters that were administered to these teachers’ students at 
Specific points in time can be evaluated ex post facto, and the match between what was taught and 
what was assessed can be calculated. This type of survey could be coupled with survey questions 
regarding the utility of the interim assessments, which is relevant to purpose 2. 


Validity Studies Based on Internal Structure. Scores from the comprehensive interim assessments 
are intended to be on the same scale as those from the summative assessments, to best measure 
students’ progress toward mastery of the knowledge and skills measured on those assessments. 
This intent requires linking the scores from the interim and summative assessments. Given that 
many of the items in the interim assessment item bank will also be used on the summative 
assessments, it is assumed that some type of common item equating will be used to place students’ 
performance on the interim assessments on the summative assessment score scale. This equating 
should be evaluated to support the inferences about how well students are likely to do on the 
Summative assessments based on their interim assessment scores. Studies in this area would 


56 


include an audit of the equating procedures, such as analysis of equating error and analysis of DIF of 
equating items across groups of students defined by state, ethnicity, or other factors (or a more 
formal population invariance study; Dorans, 2004). In addition, the degree to which interim 
assessment items fit the IRT models determined by the summative assessment scale should be 
ascertained. The fit of the equating items to this model will be of particular interest. 


Also under the realm of internal structure is evidence regarding the reliability or measurement 
precision of scores from the interim assessments. Less measurement precision relative to that of the 
summative assessments is tolerable because (a) the stakes are lower, (b) there will be multiple 
assessments, and (c) these assessments supplement the summative assessments, on which higher- 
stakes decisions are based. However, studies should be conducted to ascertain the reliabilities and 
errors of measurement associated with any scores reported from the interim assessments so that 
they can be properly interpreted. If achievement level classifications are made on the basis of these 
assessments, then estimates of DC and DA should also be calculated. 


Studies should also be conducted to evaluate the quality and accuracy of local scoring of the 
performance tasks associated with the interim assessments. Having trained scorers rescore 
samples of locally scored tasks, and the degree to which local scorers can assign similar scores to 
training sets of responses, will provide evidence regarding the quality of local scoring. 


Validity Studies Based on Response Processes. Interim Assessment Purpose 1 relates to skills 
measured on the summative assessments, and so the validity studies based on response processes 
that were described for the summative assessments are relevant here in order to confirm that the 
items are measuring higher-order skills. The response process studies for Summative Assessment 
Purpose 1 should include items that will be used on the interim assessment. The results from these 
studies should be used to “assure that each item or task clearly elicits student responses that 
support the relevant evidence statements and thus are aligned to the associated claims and 
standards” (ETS, 2012c, p. 4). 


Interim Assessment Purpose 2: 


Provide valid, reliable, and fair information about students’ performance at the content 
cluster level, so that teachers and administrators can track student progress throughout the 
year and adjust instruction accordingly 


As shown in Table 2, validity evidence to support this purpose of the interim assessments will rely on 
studies of test content, internal structure, and testing consequences. 


Validity Studies Based on Test Content. Assuming that the content validity/alignment studies 
described for the summative assessments are conducted, all items on those assessments will be 
rated regarding the degree to which they measure their intended CCSS and their intended cognitive 
skills. These studies should be extended to include the items on the interim assessments that do not 
overlap with the summative assessments. However, an additional study is needed to support 
purpose 2. A study should be conducted to confirm that the content clusters associated with the 
interim assessments represent helpful groupings of CCSS that are useful for tracking progress and 
adjusting instruction. These studies would evaluate whether the specific groupings of standards from 
the CCSS into content clusters is instructionally beneficial. 


Like all content validity studies, this study would require SMEs. Rather than reviewing items, the 
SMEs would review the CCSS that were used to create the content clusters for each claim area. Their 
task could be to group the standards in a way that would be best for providing instructionally 
relevant information. Their groupings of standards could then be compared to how the standards 
were grouped into the content clusters, and the consistency across the actual and SME-derived 
clusters could be calculated. Alternatively, the SMEs could review the content clusters and rate them 


os 


for their instructional relevance, and make comments about whether and how they might be 
rearranged. 


Validity Studies Based on Internal Structure. Information regarding the reliability and measurement 
error of cluster-level score reporting should be provided. In addition, the degree to which different 
clusters are correlated should also be reported, to see if clusters measuring different assessment 
targets or claims correlated less than clusters measuring the same claims and targets. A multitrait- 
multimethod approach could be used, using the different item formats and different claim areas as 
methods and traits, respectively (Pitoniak, Sireci, & Luecht, 2002). 


Validity Studies Based on Testing Consequences. The interim assessments are designed to “provide 
more immediately actionable data for teachers and students” (ETS, 2012c). A primary validity 
question to be studied is: Do the content cluster results help teachers and administrators track 
student progress and adjust instruction? To assess the effects on instruction, studies should be 
conducted to (a) track the use of the interim assessments and their associated supports (e.g., user 
tutorials), (0) assess the degree to which teachers and administrators find the system easy to 
navigate, and (c) assess the degree to which teachers and administrators value the information 
provided and use it to adjust instruction. Studies could also be conducted to ascertain students’ 
impressions of the system. 


Tracking the use of the interim assessments should be straightforward, assuming that most of the 
assessments are accessed online and that these testing occasions are captured by the system. 
Procedures should be in place to track any uses that are not online. Surveys of teachers and 
administrators will be needed in order to understand the degree to which these educators find the 
system useful and easy to navigate. Surveys of teachers and administrators will also be needed to 
ascertain the effects on instruction. As part of that study, “high use” teachers and schools should be 
identified and selected for further inquiry. Surveys, interviews, and focus groups of these teachers 
should be conducted, to learn about how they used interim assessment results to improve 
instruction. 


Interim Assessment Purpose 3: 


Provide valid, reliable, and fair information about individual and group (e.g., school, district) 
performance at the claim level in ELA and mathematics, to determine whether teaching and 
learning are on target. 


As shown in Table 2, validity evidence to support this purpose of the interim assessments will rely on 
studies of internal structure, relations to other variables, and testing consequences. 


Validity Studies Based on Internal Structure. This purpose statement is similar to purpose 2, with the 
difference being that rather than a focus at the content cluster level, the focus here is on the claim 
level. The studies described for purpose 2 are all relevant here. The additional studies needed would 
need to evaluate the reliability and precision of the claim scores at the group level. It is assumed that 
claim-level information will be provided by the interim assessments during the school year, and so 
estimates of the precision of this information should be provided, using the same types of internal 
structure studies described for purposes 1 and 2. 


Validity Studies Based on Relations to Other Variables. Given that the interim assessments will 
provide information at the claim level throughout the school year, it would be good to study the 
degree to which the information provided for individual students or groups of students is consistent 
with other measures of their performance relative to the CCSS. One way to study this relationship is 
to see how well the claim scores for the interim assessments predict claim scores on the summative 
assessments. In particular, it would be interesting to assess the degree to which students who are 
considered “on target” or “not on target” are classified similarly on the summative assessments. 
More interesting, however, would be to qualitatively study students who are mispredicted. That is, if 


58 


a student did poorly on an interim assessment but well on a Summative assessment, is that a 
success story or a story of poor measurement by the interim assessment? If other measures of 
student achievement are available, they would be helpful for shedding light on this issue, but it may 
be difficult to find other measures tied to the same CCSS that specific interim assessments are 
measuring. Nevertheless, assessments such as NWEA’s Measures of Academic Progress or 
Curriculum Associates’ iReady assessment may be relevant. 


Validity Studies Based on Testing Consequences. As mentioned for purpose 2, the intended 
consequence of the interim assessments is to connect the assessments to instruction to improve 
student learning. The validity studies based on testing consequences that were described for 
purpose 2 are all relevant here, with the only difference being that the information provided would be 
at the claim level and would be extended to groups of students. Therefore, the studies described 
earlier should include these factors to provide validity evidence in support of purpose 3. In addition, 
Should in-class activities (classroom interaction tasks) become part of the interim assessment 
system, their effectiveness should be a focus of the surveys, interviews, and focus groups associated 
with the studies mentioned earlier. 


Interim Assessment Purpose 4: 


Provide valid, reliable, and fair information about student progress toward the mastery of 
skills measured in ELA and mathematics across a// students and subgroups of students. 


Validity evidence in support of this purpose should come from all five sources. The validity studies 
based on test content that were described with respect to purposes 1 and 2 provide the starting 
point for equitable measurement across all students. The validity studies based on internal structure 
should report any estimates of reliability, measurement precision, DC, or DA separately for all 
Subgroups of students, and for students who take different variations of the interim assessments. In 
addition, it should be documented that access to the interim assessments has been provided to all 
students, as was discussed in relation to the summative assessments. Such access should include 
appropriate test accommodations for students with disabilities and English language learners. 


The Peer Review Guidance for NCLB assessments stipulates that states should “Provide written 
documentation of criteria for local assessments, which ensures technical quality and comparability 
to state assessments of locally used tests for ALL subgroups and content areas (includes 
modified/alternate assessments)” (U.S. Department of Education, 2009b, p. 32). The interim 
assessment system allows states and districts to create their own assessments from the banks of 
items, and so the technical quality of these local assessments will need to be studied to ensure that 
they provide comparable measurement across all groups of students. 


og 


Vil. Research Agenda for Formative Assessment Resources 


The third component of the Smarter Balanced Assessment Consortium is formative tools and 
processes, referred to in this report as formative assessment resources. These resources are not 
assessments per se, and so their evaluation does not neatly fit into the Standards’ five sources of 
validity evidence. Rather, these resources are intended to work with the summative and interim 
assessments to increase their utility for improving instruction and helping students learn. Essentially, 
the formative assessment resources are what puts the “balance” in the Smarter Balanced 
Assessment Consortium. 


The purposes of the formative assessment resources that are the focus of the comprehensive 
research agenda were listed in Chapter Ill, and, for convenience, are repeated here. 


The purposes of the Smarter Balanced formative assessment resources are to provide measurement 
tools and resources to: 


1. Improve teaching and learning. 

2. Monitor student progress throughout the school year. 

3. Help teachers and other educators align instruction, curricula, and assessment. 
A 


Help teachers and other educators use the summative and interim assessments to improve 
instruction at the individual student and classroom levels. 


5. Illustrate how teachers and other educators can use assessment data to engage students in 
monitoring their own learning. 


To accomplish these goals, the formative assessment resources will provide tools and professional 
development materials including a “Digital Library,” learning modules (lesson plans, templates, 
curriculum resources, evidence collection tools, video clips of classroom instruction and teacher 
analysis, descriptive feedback strategies, follow-up planning materials), online assessment literacy 
training products, webinars, tutorials, and PowerPoint presentations. To oversee the development, 
implementation, and maintenance of these resources, extensive collaboratives will be established, 
including: 


e National Advisory Panel 

e Digital Library Review Board 

e State Leadership Teams 

e State Networks of Educators 

e Formative Assessment Practices and Professional Learning Work Group 


The research agenda for this component of the Consortium will be an evaluation of the products 
developed for these purposes and of the processes for developing them. Studies comprising this 
evaluation should involve (a) confirming the development and successful implementation of all 
planned formative assessment resources; (b) evaluating usage statistics of all tools and other 
resources; (c) review of all documents supporting the system; (d) comprehensive surveys of the 
collaborative leadership involved in overseeing the products and processes; (e) comprehensive 
surveys of users of the resources (teachers, administrators, students, parents); and (f) case studies 
of teachers and administrators who are frequent users of the resources. It should also be confirmed 
that teachers were involved in the development and review of these materials. 


60 


Confirming Development and Successful Implementation of Products 


The RFP for the “Digital Library with Formative Assessment Practices and Professional Learning 
Resources for Educators,” hereafter referred to as RFP-23, specifies the development of several 
products using specific processes. An important step in the evaluation of the formative assessment 
resources is to confirm that all of the deliverables associated with this contract were satisfied. For 
example, RFP-23 calls for the development of at least 50 exemplar instructional modules (p. 26). 
The successful creation of these modules, and other tasks, will be audited as part of the evaluation. 
In addition, goals related to the review and implementation of all resources will be reviewed in this 
evaluation. This step will merely confirm that the intended products and activities occurred and note 
the timeliness of the deliverables. The quality of the products and their implementation will be 
evaluated using other activities described later in this chapter. 


Evaluating Usage Statistics 


The formative assessment resources are designed to be used by teachers, administrators, and even 
parents and students. If these resources are not understood and found useful, the system will be 
unbalanced, which will inhibit the goals of the entire Consortium. One way to evaluate the utility of 
the resources is to analyze their usage statistics. RFP-23 specifies reporting monthly usage statistics 
(p. 71). These statistics should be analyzed over time. Formative evaluation should inform the 
Smarter Balanced leadership about which resources are being used and which are not, so that 
better advertising or improvement of the underutilized resources can be considered. Analysis of 
usage data should be broken down by state, and by important subcategories within states, such as 
type of school, geographic region, percentage of certain subgroups of students within a school 
(English language learners, low-SES, etc.), and, where possible, demographics of the users. 


Document Review 


RFP-23 specifies several documents that are important to the integrity of the formative assessment 
resources. These documents include: 


e Comprehensive development strategy 
e Biannual implementation reports 
e Documentation of component plans and processes 


e Description of recruiting and creation of leadership committees (State Leadership Teams, State 
Networks of Educators) 


e Records of decision-making by leadership committees 
e Technical documentation of system components 


These documents will be reviewed to ensure that products are developed as intended and processes 
are followed. Any problems discovered in the documents should be followed up on to see if they were 
properly resolved. In addition, RFP-23 requires the contractor to perform and document quality 
assurance testing (pp. 69-70). This documentation will also be reviewed as part of the evaluation. 
Monitoring reports on user comments (p. 71) will also be reviewed and reported on. 


Surveys, Interviews, and Focus Groups of Leadership 


The plan for developing, implementing, and improving the formative assessment resources calls for 
full participation of educators throughout the Consortium. In particular, the State Networks of 
Educators will involve carefully selected end-users of the resources. In the evaluation, the five 
aforementioned collaboratives of leaders (National Advisory Panel, Digital Library Review Board, 
State Leadership Teams, State Networks of Educators, Formative Assessment Practices and 


61 


Professional Learning Group) will be solicited to participate in surveys, interviews, or focus groups to 
obtain their impressions of the process, the quality of the products, and the degree to which the 
formative assessment resources are accomplishing the intended goals. In addition, the intended 
representation of the membership of these committees with respect to geographic region, subject 
expertise, representation of special populations, and other characteristics will be evaluated. 


Surveys of Users 


The evaluation activities previously described will provide information on the quality of the products 
and processes and the degree to which users are accessing the resources. However, it is also critical 
to gather information regarding the degree to which the resources are perceived as being helpful to 
educators. RFP-23 includes the development of a survey to assess the effectiveness of the regional 
meetings (p. 23). The results from that survey should be considered in the evaluation. More 
importantly, however, we recommend that the research agenda include large-scale surveys of all 
users. Given that the bulk of the resources must be accessed online, we recommend that user 
surveys be implemented as part of the system. That is, at strategic points in time, users should be 
required, or heavily encouraged, to take brief surveys, for the Consortium to obtain their opinions 
regarding the usefulness of the materials and how they use the resources in their instructional 
practices. The surveys should target the specific aspects of the resources (e.g., lesson plans, 
evidence collection tools, assessment literacy training products, understanding how to use 
Summative and interim data to improve instruction, etc.). Surveys to evaluate training programs 
delivered as part of the implementation of the resources (e.g, RFP-23, p. 65) are also needed. These 
surveys are needed in order to provide evidence that the formative assessment resources are having 
an impact on classroom practices. 


Teacher survey data could also be used to create an implementation index for participating teachers, 
and those data could be correlated with students’ test scores. In particular, it would be interesting to 
correlate teachers’ implementation data with the progress that students make within the school year 
while they have the teacher. If all aspects of the system work as intended, teachers who successfully 
use the formative assessment resources will be able to use the summative and interim assessment 
results to improve instruction, and will see greater gains for their students, relative to comparable 
teachers who do not use the resources. 


It is also important to gather data on the degree to which parents, students, teachers, 
administrators, and others understand the reports from the summative and interim assessments. 
These data can be gathered using surveys to obtain opinions of the reports, and also by testing these 
individuals regarding the accuracy of their interpretations (Wainer et al., 1999). 


Case Studies of Frequent Users 


The usage data for the formative assessment resources can be used to identify teachers and 
administrators who are frequent users. A sample of these frequent users can be selected and 
recruited for in-depth study of how they use the resources. The appropriateness of their practices 
can be documented, and ideas for improving the resources, and for sharing the lessons learned by 
these teachers and administrators, can be reported. 


62 


VIII. Summary: The Smarter Balanced Assessment Consortium Validity Argument 


The preceding chapters describe a multitude of studies that comprise the comprehensive research 
agenda for the Smarter Balanced Assessment Consortium. The presentation of the agenda 
according to the different components of the system may result in two misleading perceptions. These 
potential misleading perceptions are: 


e The research agenda is too ideal to be practical because the agenda is too voluminous and 
optimistic. 

e The research agenda is fragmented and so does not address the holistic goals of the 
Consortium. 


In this chapter, we put those potential misperceptions to rest by illustrating the integration of studies 
across the various components and illustrating how many of the studies are already addressed in 
the test development and formative assessment resources development activities. 


The integration of the various studies results in an agenda that, if properly implemented, can provide 
a convincing validity argument to support the goals of the Consortium as stated in its Theory of 
Action (Appendix A). Bennett (2010) posited six questions that should be posed to evaluate a theory 
of action for a comprehensive assessment system such as Smarter Balanced. These seven 
questions are: 


Is the theory of action logical, coherent, and scientifically defensible? 

Was the assessment system implemented as designed? 

Were the interpretive claims empirically supported? 

Were the intended effects on individuals and institutions achieved, and did the postulated 
mechanisms appear to cause those effects? 

e What important unintended effects appear to have occurred? (p. 82) 


The first question can be addressed by a thoughtful review of the Smarter Balanced Theory of Action 
as a preliminary step in the evaluation. Our impression is that the theory is defensible, which is 
Supported by the fact that we were able to create a comprehensive research agenda to address its 
goals. The second question can be answered by analysis of the results from the studies outlined in 
this report, specifically the audit studies listed in Chapters III and VIl and the studies regarding 
validity evidence based on testing consequences that involve surveys, interviews, and focus groups 
of stakeholders (described in Chapters IV through VII). 


What most people think about when considering validation of an assessment system are the third 
and fourth questions posed by Bennett (2010). We, and many others (e.g., Haertel, 1999; Messick, 
1989; Shepard, 1993), would also include the sixth question. These three questions require validity 
evidence beyond typical test development activities, and require evidence stemming from all five 
sources stipulated in the Standards. It is around these three questions that the majority of studies 
described in Chapters V through VII are centered. 


The Smarter Balanced Theory of Action is based on seven principles (Smarter Balanced, 2010). 
These principles are presented in Appendix A and are presented here in more abbreviated form: 


1. Assessments are grounded in a thoughtful, standards-based curriculum and are managed as 
part of an integrated system. 

Assessments produce evidence of student performance. 

Teachers are integrally involved in the development and scoring of assessments. 

The development and implementation of the assessment system is a state-led effort with a 
transparent and inclusive governance structure. 


prt 


63 


5. Assessments are structured to continuously improve teaching and learning. 

6. Assessment, reporting, and accountability systems provide useful information on multiple 
measures that is educative for all stakeholders. 

7. Design and implementation strategies adhere to established professional standards. (pp. 32- 
33) 


A review of the purpose statements on which this comprehensive research agenda is based 
(see Chapter Ill) makes clear that the agenda is focused on evaluating the degree to which 
these principles are realized. To pull the comprehensive research together—that is, to 
document the validity argument for Smarter Balanced in a coherent manner to best inform 
stakeholders and the general public—a report should be produced that indicates how the 
various pieces of evidence gathered through the research agenda confirm that these seven 
principles are realized. If the research agenda outlined in this report is followed, it will provide 
ample evidence that could be organized in a reader-friendly report that is organized around 
these seven principles. It is clear that the research agenda outlined here addresses the 
seventh principle. Our review of Smarter Balanced activities to date supports the fourth 
principle, and evidence for the collaboration could easily be documented. The remaining five 
principles would be supported by evidence from the studies described in this report. 


Summarizing the Validity Evidence 


As promised earlier in this chapter, the validity studies described in this report will appear less 
daunting when the overlap of studies across the different purposes and components of the 
Smarter Balanced assessment system is accounted for. This integration is presented in Tables 
¢ and 8. Table 7 presents brief descriptions of each proposed study in the form of short labels, 
indicates the purposes that each study addresses, and provides a unique number for each 
study. It also lists the page numbers in this document that refer to each study. Table 8 uses this 
numbering system to illustrate the places where such studies are already accounted for in 
current or planned Smarter Balanced activities. Table 8 is also available as an Excel file, so that 
its data can be sorted by columns to facilitate different research planning activities. It may be 
tempting to prioritize the studies based on the number of check marks in each row of Table 7, 
but because the purposes in the columns are not equal in importance, and because the 
contribution of each study to the validity argument will not be equal, such an interpretation 
would be an oversimplification. 


64 


Table 7. Listing of Studies by Source of Evidence and Testing Purpose. 


Summative Assessment 
AY Co [=) alors) Purpose 
Sources 


Resources 
Purpose 


Assessment 
Purpose 


Taicevalan Formative 


Study Number and Description Page Numbers 


4/2/3|/4 5 6/7|/41 2 3/4/1)/2 3/4) 5| 
emcenennvm [aes | oo [ole fifefofodelv babel T Toole 
2 Analysis of measurement precision eek oO 

y P 51-52, 53, 56, 58-59 


| 3 Audit of test administration Audit of test administration 17, ante 


camiay 1 ais 1 er eiatel tateatalvictafeniete 
ewemsetrnneocenuins | vseeneceee| of Niet | a tt tt 
Creel A Mc cn Rca Gl dl 
¢ Evaluation | T Evaluation offaimess fairness TA y22; | 17,22,52,59 59 Sooo EES 000000 av an 


aCcCe€ss 


9 Audit of test security of test 9 Audit of test security 17, 24- | 47, 28-25,48 A8 34 |v av av] py SURGE 
25-31, 36, 39-40, 46, 
11 stevaluatingECD stevaluatingECD p32 31 = : ae a av ae 


|A2IRTresidual analysis IRT residual |A2IRTresidual analysis | 81-844 34, 46 - ae 

13 Reliability and standard error 17-19, 31-34, 36-37, a |e 

estimation 50-52, 53-54, 56-59 

| 14 Cognitive skills and item response time _ | 14 Cognitive skills and item response time _ skills and item response time 24, | 24,35, 54,56 54, 56 Pose fost RE BEEP 


Assessment Resources 


AV (ol=Valex-) Purpose Purpose Purpose 


Sources 


Summative Assessment 


Taiccvalan me) anatelaiyicy 


Study Number and Description Page Numbers 


12345 6)7 123 4/4 23 4 5) 


15 Cognitive interviews, think aloud 5456 | 2 (YM) | | fy] dy] [yf | | tt tf 
16 Decision consistency and accuracy 36-38,41, 56-57, 59 3 | fyivivt fyivivp ff fof ppp 
£7 Cutscore standard error 36-37 TE EEE ERE RRGRnE 


| 18 Crterion-related validation of “on track” Criterion-related validation of “ | 18 Crterion-related validation of “on track” track” | 8788 38 

19 Educator interviews, focus groups, 38-39, 44-45, 49-50, 

surveys 58-59 

| 20 Criterion-related validation of readiness | Criterion-related validation of readiness 39-45 45 ah av vy av Pp iyt [yp fo 


ee ce cS 
22 ————— of enrollment, dropout, 38, 45, 55 
courses 


23 Teacher morale suveys Teacher morale 23 Teacher morale suveys | 45,5255 52,55 Sy av OMG av av av vy av av av 
24 Teacher surveys on changes in 45, 49, 52-53, 55, 59, 

students 61-62 

| 25 Student morale and aspirations surveys. Student morale and — surveys} 45,520 2 45,52 52 Cy an 


scoters | cee eM ECUUSCOCREED CEE 
27 Criterion-related studies re: 


28 Follow- | 28 Follow-up on speofie student decisions | 49-50 50 an av av av Vv av av vy av av vy 


ckedeuiaheliasn - Se EEE 


66 


Assessment Resources 


AV (ol=Valex-) Purpose Purpose Purpose 


Sources 


Summative Assessment 


Taiccvalan me) anatelaiyicy 


Study Number and Description Page Numbers 


12345 6)7 123 4/14 23 4 5) 


Ca ee Oe RUEBEN 
Ce ee oe ESSER ERR 
eermnmactegeenn | mea i ee 
eemommmas ea ee 
Cc Ue 
pene eee eet eee 


| 37 Audit of test accommodations Audit of test accommodations 


| 39 Differential predictive vality Differential predictive | 39 Differential predictive vality ON  e 
[onematmnonae | ween t= TTT pe TP eto 
Atronnntesterupaiternom | ae | 4 | III 4 |e 
Cai i eee 
sBSomeeneesuernsourianumeuy | 6-87 | as | | | | iY | | ie) iia 
nnn | me ae ee 
eames | meme |e] TT ET eee a 


67 


Summative Assessment 


Assessment Resources 


AV (ol=Valex-) Purpose Purpose Purpose 


Taiccvalan me) anatelaiyicy 
Sources 


Study Number and Description Page Numbers 


12) 34) 5) 6) 7 tas) 4) 4) 2345) 
a ee interviews, focus groups of 7 
47 Audit of formative resources 
61-62 
development and implementation 


| 48 Analysis of usage stats for formative | 48 Analysis of usage stats for formative of usage stats for formative 6162 62 a av av av av 
| 49 Surveys of collaborative leadership | 49 Surveys of collaborative leadership of collaborative leadership | 6162 62 a : : 
surveys 

| 51 Formative assessment user surveys Formative assessment user | 51 Formative assessment user surveys py av av av av 


a 
somemeswenes —  @ | * ffi) ) 111) epee 
a I oo 


55 eS of validity evidence acc. to 64-68 J |v 
f principles 


Note: Evidence Sources. 1 = Test Content, 2 = Response Processes, 3 = Internal Structure, 4 = Relations to Other Variables, 5 = Testing Consequences 


68 


Table 8 (to be populated): Connecting Recommended Studies to Current Activities and RFPs 


Source of Summative Haas Formative 
Ao hal alle Evidence vont Assessments Assessments qecnbal: 


1 TC audit 

2 Meas. precision 

3 Administration audit 
4 Evaluation of scoring 
5 Scaling and equating 
6 Standard setting 

¢ Evaluation of fairness 
8 Equity 

9 Audit of test security 
10 Content validity 

11 Evaluating ECD 

12 IRT residual analysis 
13 Reliability and SE 
14 item response time 
15 Cognitive interviews 
16 DC, DA 

17 Cut-score SE 

18 Criterion-related OT 
19 Educator surveys 


20 Readiness 


21 Postsecondary 


surveys 


22 Dropout 


23 Teacher morale 


24 Change surveys 


O) 
Ce) 


me) daatslahvic 
Assessment 
Resources 


Taicsvalan 
Assessments 


Summative 
Assessments 


Source of 


mAYsColsyalers, Contract 


Study and Number 


25 Student morale 


26 Vertical scale 


27 Gain (growth) 

28 Student decisions 
29 Sensitivity 

30 Classroom artifacts 
31 Score reports 

32 Report usage rates 
33 Aggregate stats 

34 G-studies 

35 Item parameter drift 


36 UTD and sensitivity 


37 Test 


accommodations 


38 DIF 

39 Diff. prediction 

4O Invariance 

41 Group differences 
42 MTMM 

43 Scope and sequence 
44 Content clusters 
A5 Interim usage 

46 Surveys high users 
47 Formative audit 
48 Formative usage 


A9 Collabor. leadership 


7 


‘o) 


Source of Summative Interim FUMES 
Study and Number Contract Assessment 


maYi(e(syalers: Assessments Assessments 
50 Educator FA surveys 
51 FA user surveys 


Resources 


53 Case studies: users 
54 Theory of Action 
55 Summary of validity 


52 Parent/student 
surveys 


(1 


IX. Ongoing Validation Activities and Support Systems 


Validation can be thought of as a great job for a masochist because, in a sense, one can never 
absolutely “prove” that an assessment is totally valid for the complex purposes to which it is put 
(Haertel, 1999), and because assessments are dynamic, and they, and the populations that they 
assess, Change over time, validation is an ongoing, essentially perpetual, endeavor. Nonetheless, at 
some point, decisions must be made regarding whether sufficient evidence exists to justify the use 
of a test for a particular purpose. Most of this report has focused on the purpose of conducting 
studies to provide such evidence and documenting the evidence into a coherent validity argument 
that would satisfy professional testing standards, federal peer review, and legal challenges. 
However, our professional responsibilities also require us to think toward the future, beyond the 
current funding for Smarter Balanced, and consider the potential positive and negative 
consequences that should be addressed in longer-range validation studies. 


At this juncture, a few potential validity activities appear in the crystal ball. One is studying the 
degree to which products and processes provided by the Consortium persevere and are used over 
time. The Consortium’s processes, products, and activities are designed to produce an enduring 
collaboration and resources that should outlive the Consortium. Thus, studying the long-term effects 
of Smarter Balanced on instruction, within and outside the Consortium states, would be an 
interesting research area. 


Another area of interest is the specific uses of the Smarter Balanced assessments and formative 
resources beyond the currently anticipated uses. It is quite possible that states, districts, and 
schools will use the assessments for purposes that they think are useful and valid, but that are not 
currently anticipated. Some of these uses may be appropriate and creative; others may be 
problematic or even damaging. States and districts will certainly use some assessments and tools 
for educator accountability, and so the validity of such use is an area in need of future research. 


Although all important areas of future research cannot be anticipated at this time, it is still wise to 
consider the support systems that Smarter Balanced can put in place to facilitate future validity 
research. For example, other large-scale assessment programs, such as NAEP, TIMSS, and PISA, 
make data available for secondary analyses. Occasionally, these programs provide grant money to 
Support such secondary analyses. The types of studies to be funded can be specified in advance, or, 
preferably, applicants for funding could be asked to submit their own ideas for research to study 
what they believe are important validity questions. 


Another example of a support system is the College Board’s “validity research study service.” This 
service is essentially a data-sharing agreement between the College Board and postsecondary 
institutions, whereby the institutions can send course grade information to the College Board and it 
will match the data with SAT scores and other College Board assessment scores. These matched 
data sets can then be used to conduct local validity studies for each institution. 


In considering potential validity studies that will be important in the future, and by establishing 
research support systems, validity research for Smarter Balanced can outlive the formal research 
studies that will comprise the documented validity argument for the Consortium. 


12 


References 


Achieve, Inc. (2006). An alignment analysis of Washington state’s college readiness mathematics 
standards with various local placement tests. Cambridge, MA: Author. 


ACT. (2005a). Developing achievement levels on the 2005 National Assessment of Educational 
Progress in grade 12 mathematics. Process report. lowa City, IA: Author. 


ACT. (2005b). Developing achievement levels on the 2005 National Assessment of Educational 
Progress in grade 12 mathematics. Special studies report. \owa City, IA: Author. 


ACT. (2005c). Developing achievement levels on the 2005 National Assessment of Educational 
Progress in grade 12 mathematics. Technical report. \owa City, IA: Author. 


ACT. (2006). Ready for college, ready for work. Same or different? lowa City, IA: Author. 


ACT. (2010). /ssues in college readiness. What are ACT’s college readiness benchmarks? lowa City, 
IA: Author. 


Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and 
Psychological Measurement, 40, 955-959. 


Allen, J., & Sconing, J. (2005). Using ACT assessment scores to set benchmarks for college readiness 
(ACT Research Report Series 2005-3). lowa City, IA: ACT. 


American Educational Research Association (AERA), American Psychological Association (APA), & 
National Council on Measurement in Education (NCME). (1999). Standards for educational 
and psychological testing. Washington, DC: American Educational Research Association. 


American Evaluation Association. (2004). Guiding principles for evaluators. Retrieved from 
http://www.eval.org/publications/guidingprinciples.as 


Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): A preliminary 
theory of action for summative and formative assessment. Measurement, &, (0-91. 


Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with states' content standards: 
Methods and issues. Educational Measurement: Issues and Practice, 2X3), 21-29. 


Bock, R. D., Gibbons, R. D., & Muraki, E. (1988). Full-information factor analysis. Applied 
Psychological Measurement, 12, 261-280. 


Brennan, R. L. (2002). Estimated standard error of a mean when there are only two observations 
(CASMA Technical Note Number 1). lowa City, IA: Center for Advanced Studies in 
Measurement and Assessment, University of lowa. 


Brennan, R. L. (2011). Using generalizabilty theory to address reliability issues for PARCC 


assessments: A white paper. lowa City, IA: Center for Advanced Studies in Measurement and 
Assessment. 


3 


Briggs, D. C. (2012, April). Making inferences about growth and value-added: Design issues for the 
PARCC consortium. Paper presented at the annual meeting of the National Council on 
Measurement in Education, Vancouver, BC. 


Briggs, D. C. (2013). Measuring growth with vertical scales. Journal of Educational Measurement, 
5QOH2), 204-226. 


Camara, W. (2012, April). Defining and measuring college and career readiness: Developing 
performance level descriptors and defining criteria. Paper presented at the annual meeting 
of the National Council on Measurement in Education, Vancouver, BC. Retrieved from 
http://research.collegeboard.org/sites/default/files/publications/2012//7/presentation- 
2012-3-developing-performance-level-descriptors-criteria. pdt 


Camara, W. J. (2013). Defining and measuring college and career readiness: A validation framework. 
Educational Measurement: Issues and Practice, 3X4), 16-27. 


Camara, W., & Quenemoen, R. (2012). Defining and measuring college and career readiness and 
informing the development of performance level descriptors. Retrieved from 
http://www.parcconline.org/sites/parcc/files/ PARCC%20CCR%20paper%20v14%201-8- 
12.pdf 


Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait- 
multimethod matrix. Psycho/ogical Bulletin, 56, 81-105. 


Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differential item 
functioning test items. Educational Measurement: Issues and Practice, 17, 31-44. 


Conley, D. T., Drummond, K. V., Gonzalez, A., Rooseboom, J., & Stout. O. (2011). Reaching the goal: 
The applicability and importance of the Common Core State Standards to college and career 
readiness. Eugene, OR: Educational Policy Improvement Center. 


Crocker, L. M., Miller, D., & Franks, E. A. (1989). Quantitative methods for assessing the fit between 
test and curriculum. Applied Measurement in Education, 2, 179-194. 


Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), 
Test validity (pp. 3-17). Hillsdale, NJ: Lawrence Erlbaum. 


Crotts, K., Sireci, S. G., & Zenisky, A. L. (2012). Evaluating the content quality of a multistage- 
adaptive test. Journal of Applied Testing Technology, 1X1), 1-26. 


D’Agostino, J. V., & Bonner, S. M. (2009). High school exit exam scores and university performance. 
Educational Assessment, 14, 25-47. 


D’Agostino, J., Karpinski, A., & Welsh, M. (2011). A method to examine content domain structures. 
International Journal of Testing, 11(4), 295-307. 


D’Agostino, J. V., Welsh, M., & Corson, N. M. (2007). Instructional sensitivity of a state’s standards- 
based assessment. Educational Assessment, 12, 1-22. 


Davis-Becker, S. L., Buckendahl, C. W., & Gerrow, J. (2011). Evaluating the bookmark standard 


setting method: The impact of random item ordering. /nternational Journal of Testing, 11(1), 
24-37. 


14 


Day, S. X., & Rounds, J. (1998). Universality of vocational interest structure among racial and ethnic 
minorities. American Psychologist, 53, (28-736. 


Delisle, D. S. (2012). Letter to chief state schoo! officers. Washington, DC: U.S. Department of Education. 


Dorans, N. J. (2004). Using subpopulation invariance to assess test score equity. Journal! of 
Educational Measurement, 41, 43-68. 


Duncan, G. D., del Rio Parant, L., Chen, W.-H., Ferrara, S., Johnson, E., Oppler, S., & Shieh, Y.-Y. 
(2005). Study of a dual-language test booklet in eighth-grade mathematics. Agp/ied 
Measurement in Education, 18, 129-161. 


Educational Testing Service (ETS). (2012a). SmarterO9 component 1: Narrative key to understanding 
the blueprint tables (TAC draft). Princeton, NJ: Author. 


Educational Testing Service (ETS). (2012b). Smarter Balanced Assessment Consortium. Bias and 
sensitivity guidelines. Princeton, NJ: Author. 


Educational Testing Service (ETS). (2012c). Soecifications for an interim system of assessment. 
Princeton, NJ: Author. 


Embretson (Whitley), S. (1983). Construct validity: construct representation versus nomothetic span. 
Psychological Bulletin, 93, 179-197. 


Ercikan, K., & Koh, K. (2005). Examining the construct comparability of the English and French 
versions of TIMSS. /nternational Journal of Testing, (1), 23-35. 


Fields, R., & Parsad, B. (2012). 7ests and cut scores used for student placement in postsecondary 
education: Fall 2011. Washington, DC: National Assessment Governing Board. 


Fuchs, L. S., Fuchs, D., Eaton, S. B., Hamlett, C. L., & Karns, K. M. (2000). Supplementing teacher 
judgments of mathematics test accommodations with objective data. Schoo/ Psychology 
Review, 29, 65-85. 


Haertel, E. H. (1999). Validity arguments for high-stakes testing: In search of the evidence. 
Educational Measurement: Issues and Practice, 1&4), 5-9. 


Hambleton, R. K. (1980). Test score validity and standard setting methods. In R. A. Berk (Ed.), 
Criterion-referenced measurement. The state of the art. Baltimore, MD: Johns Hopkins 
University Press. 


Hambleton, R. K. (1989). Principles and applications of item response theory. In R. L. Linn (Ed.), 
Educational measurement (3rd ed., pp. 147-200). New York, NY: Macmillan. 


Hambleton, R. K. (2005). Issues, designs and technical guidelines for adapting tests into multiple 
languages and cultures. In R. K. Hambleton, P. F. Merenda, & C. D. Spielberger (Eds.), 
Adapting psychological and educational tests for cross-cultural assessment (pp. 3-38). 
Hillsdale, NJ: Lawrence Erlbaum. 


Hambleton, R. K., & Han, N. (2004). Summary of our efforts to compute decision consistency and 
decision accuracy estimates using IRT item statistics (Center for Educational Assessment 


15 


Research Report No. 552). Amherst, MA: University of Massachusetts, Center for Educational 
Assessment. 


Hambleton, R. K., & Rovenelli, R. J. (1986). Assessing the dimensionality of a set of test items. 
Applied Psychological Measurement, 10, 287-302. 


Hambleton, R. K., & Slater, S. (1997). Are NAEP executive summary reports understandable to policy 
makers and educators? (CSE Technical Report 430). Los Angeles, CA: National Center for 
Research on Evaluation, Standards, & Student Testing. 


Hamilton, L. S. (1994, April). Validating hands-on science assessments through an investigation of 
response processes. Paper presented at the Annual Meeting of the American Educational 
Research Association, New Orleans, LA. 


Hattie, J. A. (1985). Methodology review: Assessing unidimensionality of a set of test items. Aoo/ied 
Psychological Measurement, 9, 139-164. 


Hill, R. K., & DePascale, C. A. (2002). Determining the reliability of school scores. Dover, NH: National 
Center for the Improvement of Educational Assessment. 


Hill, R. K., & DePascale, C. A. (2003, April). Adequate yearly progress under NCLB: Reliability 
considerations. Paper presented at the annual meeting of the National Council of 
Measurement in Education, Chicago, IL. 


Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence 
Erlbaum. 


International Test Commission (ITC). (2010). Guidelines for translating and adapting tests. Retrieved 


from http://www.intestcom.org 


Johnstone, C. J., Altman, J. M., & Thurlow, M. (2006). A state guide to the development of universally 
designed assessments. Minneapolis, MN: University of Minnesota, National Center on 
Educational Outcomes. Retrieved from 


http://www.cehd.umn.edu/nceo/OnlinePubs/StateGuideUD/default.atm 


Kaira, L. T., & Sireci, S. G. (2010). Evaluating content validity in multistage adaptive testing. CLEAR 
Exam Review, 21(2), 15-23. 


Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527-535. 


Kane, M. T. (1994). Validating the performance standards associated with passing scores. Review of 
Educational Research, 64, 425-461. 


Kane, M. T. (2001). So much remains the same: Conception and status of validation in setting 
standards. In G. Cizek (Ed.), Setting performance standards. Concepts, methods and 
perspectives (pp. 53-88). Mahwah, NJ: Erlbaum. 


Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). 
Washington, DC: American Council on Education/Praeger. 


16 


Keng, L., Murphy, D., & Gaertner, M. (2012, April). Suoported by data: A comprehensive approach for 
building empirical evidence for standard setting. Paper presented at the annual meeting of 
the National Council on Measurement in Education, Vancouver, BC. 


Kolen, M. J. (2011). /ssues associated with vertical scales for PARCC assessments. Retrieved from 
http://www.parcconline.org/sites/parcc/files/PARCCVertScal289-12-201129.pdf 


Kolen, M. J., & Brennan, R. L. (2004). 7est equating, scaling, and linking: Methods and practices 
(2nd ed.). New York, NY: Springer-Verlag. 


La Salle, A., Munoz, C., Ruff, L., Weisman, E., Sedillo, R., & Phillips, L. (2012, April). Grounded in the 
content: The role of content analysis in evidence-based standard setting. Paper presented at 
the annual meeting of the National Council on Measurement in Education, Vancouver, BC. 


Lee, W. (2008). Classification consistency and accuracy for complex assessments using item 
response theory (CASMA Research Report No. 27). lowa City, IA: University of lowa. 


Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: The collection of 
verbal reports in educational achievement testing. Educational Measurement: [ssues and 
Practice, 2X4), 6-15. 


Liang, T., Han, K. T, & Hambleton, R. K. (2008). User’s guide for ResidPlots-2: Computer software for 
IRT graphical residual analyses, Version 2.0 (Center for Educational Assessment Research 
Report No. 688). Amherst, MA: University of Massachusetts, Center for Educational 
Assessment. 


Liang, T., Han, K. T., & Hambleton, R.K. (2009). ResidPlots-2: Computer software for IRT graphical 
residual analyses. Avplied Psychological Measurement, 335), 411-412. 


Linacre, J. M. (2004). A user’s guide to Winsteps Rasch-mode!l computer programs. Chicago, IL: 
MESA Press. 


Linn, R. L. (2006). The standards for educational and psychological testing: Guidance in test 
development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test deve/opment (pp. 
2/-38). Mahwah, NJ: Lawrence Erlbaum. 


Linn, R. L., Baker, E. L., & Betebenner, D. W. (2002). Accountability systems: Implications of 
requirements of the No Child Left Behind Act of 2001. Educational Researcher, 31(6), 3-16. 


Livingston, S. A., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications 
based on test scores. Journal of Educational Measurement, 32, 179-197. 


Loomis, S. C. (2011, April). 7oward a validity framework for reporting preparedness of 12th graders 
for college-level course placement and entry to job training programs. Paper presented at the 
annual meeting of the National Council on Measurement in Education, New Orleans, LA. 


Marion, S., White, C., Carlson, D., Erpenbach, W. J., Rabinowitz, S., Sheinker, J., & Council of Chief 
State School Officers (CCSSO). (2002). Making valid and reliable decisions in determining 
adequate yearly progress. A paper in the series: Implementing the State Accountability 
System Requirements under the No Child Left Behind Act of 2001. Washington, DC: CCSSO. 


(Tf 


Marsh, H. W., Martin, A. J., & Jackson, S. (2010). Introducing a short version of the physical self 
description questionnaire: New strategies, short-form evaluative criteria, and applications of 
factor analyses. Journal of Sport & Exercise Psychology, 3X4), 438-482. 


Martone, A., & Sireci, S. G. (2009). Evaluating alignment between curriculum, assessments, and 
instruction. Review of Educational Research, 4, 1332-1361. 


Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed.). Washington, DC: 
American Council on Education. 


Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning (CRESST Report 752). 
Los Angeles, CA: National Center for Research on Evaluation, Standards, & Student Testing. 


Mislevy, R. J., & Riconscente, M. M. (2006). Evidence-centered assessment design. In S. M. Downing 
& T. M. Haladyna (Eds.), Handbook of test development (pp. 61-90), Mahwah, NJ: Lawrence 
Erlbaum. 


National Assessment Governing Board (NAGB). (2010). Program of preparedness research study 
brief. Retrieved from 
http://www.nagb.org/content/nagb/assets/documents/newsroom/press- 
releases/2010/release-20101122/research.pdf 


National Council on Measurement in Education (NCME). (2012). 7esting and data integrity in the 
administration of statewide student assessment programs. Madison, WI: Author. 


National Governors Association Center for Best Practices (NGA Center) & CCSSO. (2010). Common 
Core State Standards for mathematics. Washington, DC: Authors. 


Northwest Evaluation Association. (2005). 7echnical manual: For use with Measures of Academic 
Progress and achievement level tests. Lake Oswego, OR: Author. 


O’Malley, K., Keng, L., & Miles, J. (2012). Using validity evidence to set performance standards. In 
G. J. Cizek (Ed.), Setting performance standards (2nd ed.) (pp. 301-322). New York, NY: 
Routledge. 


O’Neil, T., Sireci, S. G., & Huff, K. F. (2004). Evaluating the consistency of test content across two 
successive administrations of a state-mandated science assessment. Educational! 
Assessment, 9, 129-151. 


Patz, R. J. (2007). Vertical scaling in standards-based educational assessment and accountability 
systems. Washington, DC: CCSSO. 


Penfield, R. D., & Miller, J. M. (2004). Improving content validation studies using an asymmetric 
confidence interval for the mean of expert ratings. Aoplied Measurement In Education, 1/4), 
359--370. 


Pitoniak, M. J., Sireci, S. G., & Luecht, R. M. (2002). A multitrait-multimethod validity investigation of 
scores from a professional licensure exam. Educational and Psychological Measurement, 62, 
498-516. 

Popham, W. J. (1992). Appropriate expectations for content judgments regarding teacher licensure 


tests. Aoplied Measurement in Education, 5, 285-301. 


18 


Porter, A. C., & Smithson, J. L. (2002, April). Alignment of assessments, standards and instruction 
using curriculum indicator data. Paper presented at the Annual Meeting of the American 
Educational Research Association, New Orleans, LA. 


Rabinowitz, S., Zimmerman, J., & Sherman, K. (2001). Do high-stakes tests drive up student dropout 
rates? Myths versus realities (Research brief). San Francisco, CA: WestEd. 


Ramsey, P. A. (1993). Sensitivity review: The ETS experience as a case study. In P. W. Holland & H. 
Wainer (Eds.), Differential item functioning (pp. 367-388). Hillsdale, NJ: Erlbaum. 


Reckase, M. D. (2006a). A conceptual framework for a psychometric theory for standard setting with 
examples of its use for evaluating the functioning of the standard setting methods. 
Educational Measurement: Issues and Practice, 25(2), 4-18. 


Reckase, M. D. (2006b). Rejoinder: Evaluating standard setting methods using error models 
proposed by Schultz. Educational Measurement: Issues and Practice, 25(3), 14-17. 


Robin, F., Sireci, S. G., & Hambleton, R. K. (2003). Evaluating the equivalence of different language 
versions of a credentialing exam. /nternational Journal of Testing, 3, 1-20. 


Rothman, R. (2003). /mperfect matches: The alignment of standards and tests. Washington, DC: 
National Research Council. 


Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practica/ 
Assessment, Research & Evaluation, (14). 


Rudner, L. M. (2004). Exoected classification accuracy. Paper presented at the annual meeting of 
the National Council on Measurement in Education, San Diego, CA. 


Select Committee. (1994/1995). Sampling and statistical procedures used in the California Learning 
Assessment System. In L. J. Cronbach (Ed.). A va/eadictory: Reflections on 60 years in 
educational testing (Board Bulletin). Washington, DC: National Research Council, Board on 
Testing and Assessment. 


Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19, 405-450. 


Silk, Y., Silver, D., Amerian, S., Nishimura, C., & Boscardin, C. K. (2009). Using classroom artifacts to 
measure the efficacy of a professional development (CRESST Report 761). Los Angeles, CA: 
National Center for Research on Evaluation, Standards, & Student Testing. 


Simpson, M., Gong, B., Marion, S., National Center on Educational Outcomes, CCSSO, & National 
Association of State Directors of Special Education. (2006). Effect of minimum cell sizes and 
confidence interval sizes for special education subgroups on school-level AYP determinations 
(Synthesis Report 61). Minneapolis, MN: National Center on Educational Outcomes, 
University of Minnesota. 


Sireci, S. G. (1997). Problems and issues in linking tests across languages. Educational 
Measurement: Issues and Practice, 16(1), 12-19. 


Sireci, S. G. (1998). Gathering and analyzing content validity data. Educational Assessment, 5, 299- 
321. 


19 


Sireci, S. G. (2009). Packing and upacking sources of validity evidence: History repeats itself again. 
In R. Lissitz (Ed.), 7he concept of validity: Revisions, new directions and applications (pp. 19- 
37). Charlotte, NC: Information Age Publishing Inc. 


Sireci, S. G., & Geisinger, K. F. (1992). Analyzing test content using cluster analysis and 
multidimensional scaling. Aool/ied Psychological Measurement, 16, 17-31. 


Sireci, S. G., & Geisinger, K. F. (1995). Using subject matter experts to assess content 
representation: An MDS analysis. Applied Psychological Measurement, 19, 241-255. 


Sireci, S. G., Han, K. T., & Wells, C. S. (2008). Methods for evaluating the validity of test scores for 
English language learners. Educational Assessment, 13, 108-131. 


Sireci, S. G., Hauger, J. B., Wells, C. S., Shea, C., & Zenisky, A. L. (2009). Evaluation of the standard 
setting on the 2005 grade 12 National Assessment of Educational Progress mathematics 
test. Aoplied Measurement in Education, 22, 339-358. 


Sireci, S. G., & Mullane, L. A. (1994). Evaluating test fairness in licensure testing: The sensitivity 
review process. CLEAR Exam Review, 5(2), 22-28. 


Sireci, S. G., Robin, F., Meara, K., Rogers, H. J., & Swaminathan, H. (2000). An external evaluation of 
the 1996 grade 8 NAEP science framework. In N. Raju, J. W. Pellegrino, M. W. Bertenthal, 
K. J. Mitchell, & L. R. Jones (Eds.), Grading the nation’s report card: Research from the 
evaluation of NAEP (pp. (4-100). Washington, DC: National Academies Press. 


Sireci, S. G., Rogers, H. J., Swaminathan, H., Meara, K., & Robin, F. (2000). Appraising the 
dimensionality of the 1996 grade 8 NAEP science assessment data. In N. Raju, J. W. 
Pellegrino, M. W. Bertenthal, K. J. Mitchell, & L. R. Jones (Eds.), Grading the nation’s report 
card: Research trom the evaluation of NAEP (pp. 101-122). Washington, DC: National 
Academies Press. 


Sireci, S. G., & Schweid, J. A. (2011, April). Beyond alignment: Important questions to ask (and 
answer) to evaluate content validity. Paper presented at the annual meeting of the American 
Educational Research Association, New Orleans, LA. 


Sireci, S. G., & Wells. C. S. (2010). Evaluating the comparability of English and Spanish video 
accommodations for English language learners. In P. Winter (Ed.), Eva/uating the 
comparability of scores from achievement test variations (pp. 33-68). Washington, DC: 
CCSSO. 


Smarter Balanced Assessment Consortium (Smarter Balanced). (2010). Race to the Top assessment 
program application for new grants: Comprehensive assessment systems, CFDA Number: 
84,395B. OMB Control Number 1810-0699. 


Smarter Balanced Assessment Consortium (Smarter Balanced). (2012a). Master work olan— 
formative. Retrieved from http://www.smarterbalanced.org/wordpress/wp- 


content/uploads/2012/03/Formative-Assessment-Master-Work-Plan-Narrative.pdf 


Smarter Balanced Assessment Consortium (Smarter Balanced). (2012b). 7heory of action. An 
excerot from the Smarter Balanced Race to the Top application. Retrieved from 
http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/02/Smarter- 


Balanced-Theory-of-Action.pdf 


80 


Smarter Balanced Assessment Consortium (Smarter Balanced). (n.d.). Frequently asked questions. 


Retrieved from http://www.smarterbalanced.org/resources-events/fagqs/ 


Tomlinson, M. R., & Fortenberry, N. (2008, October). Classroom artifacts: Tools to assess the use of 
active, innovative, and engineering pedagogies among engineering faculty. Paper presented 
at the annual ASEE/IEEE Frontiers in Education conference, Saratoga Springs, NY. Retrieved 


from http://fie-conference.org/fie2008/papers/ 1088. pdf 


Tong, Y., & Kolen, M. J. (2007). Comparisons of methodologies and results in vertical scaling for 
educational achievement tests. Aop/ied Measurement In Education, 202), 227-253. 


U.S. Department of Education. (2009a). Race to the Top program executive summary. Washington, 
DC: Author. 


U.S. Department of Education. (2009b). Standards and assessments peer review guidance: 
Information and examples for meeting requirements of the No Child Left Behind Act of 2001. 
[Revised December 21, 2007 to include modified academic achievement standards. Revised 
with technical edits, January 12, 2009/. Washington, DC: Author. 


U.S. Department of Education (2010). U.S. secretary of education Duncan announces winners of 
cOmpen to improve student assessments. Retrieved from 


competition- improve-student- asse 


Vasavada, N., Carman, E., Hart, B., & Luisser, D. (2010). Common Core State Standards alignment: 
ReadiSteop, PSAT/NMSQT, and SAT (Research Report 2010-5). New York, NY: The College 
Board. 


Wainer, H. (2011). Uneducated guesses: Using evidence to uncover misguided education policies. 
Princeton, NJ: Princeton University Press. 


Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP 
results: A redesign and validity study. Journal of Educational Measurement, 36, 301-335. 


Webb, N. L. (1999). Alignment of science and mathematics standards and assessments in four 
states (Research Monograph No. 18). Washington, DC: CCSSO. 


Webb, N. L. (2007). Issues related to judging the alignment of curriculum standards and 
assessments. Apolied Measurement in Education, 201), 7-25. 


Welch, C., & Dunbar, S. (2011, April). A-22 assessments and college readiness: Necessary validity 
evidence for educators, teachers, and parents. Paper presented at the annual meeting of the 
National Council on Measurement in Education, New Orleans, LA. 


Wells, C. S., Baldwin, S., Hambleton, R. K., Sireci, S. G., Karantonis, A., & Jirka, S. (2009). Evaluating 
score equity assessment for state NAEP. Ago/ied Measurement in Education, 22, 394-408. 


Wilson, D, Wood, R., & Gibbons, R. D. (1991). JESTFACT: Test scoring, item statistics, and item factor 
analysis [computer program]. Mooresville, IN: Scientific Software. 


81 


Williams, N. J., Keng, L., & O’Malley, K. (2012, April). Maximizing panel input: Incorporating empirical 
evidence in a way the standard setting panel will understand. Paper presented at the annual 
meeting of the National Council on Measurement in Education, Vancouver, BC. 


Wyatt, J., Kobrin, J., Wiley, A., Camara, W. J., & Proestler, N. (2011). SA7 benchmarks. Development 
of a college readiness benchmark and its relationship to secondary and postsecondary 
school performance. (Research Report 2011-5). New York, NY: The College Board. 


Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). 7he program evaluation 
standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage. 


Yen, W. M. (1997). The technical quality of performance assessments: Standard errors of percents of 
pupils reaching standards. Educational Measurement: /ssues and Practice, 16(3), 5-15. 


Young, J., Pitoniak, M. J., King, T. C., & Ayad, E. (2012). Smarter Balanced Assessment Consortium: 
Guidelines for accessibility for English language learners. Retrieved from 
http://www.smarterbalanced.org/smarter-balanced-assessments 


Zwick, R., & Schlemer, L. (2004). SAT validity for linguistic minorities at the University of California, 
Santa Barbara. Educational Measurement: /ssues and Practice, 2X1), 6-16. 


82 


Appendix A: Smarter Balanced Theory of Action and Derivation of Purpose Statements 


Smarter Balanced Assessment Consortium Theory of Action 
Bennett (2010) described a Theory of Action (TOA) as follows: 


Theory of Action is a common notion in the program evaluation literature . . . appearing to 
have come about because program managers were too often unclear about the intended 
goals of their efforts. The term is closely associated with /ogic model, a graphical or textual 
description of an intervention that explains the cause-effect relationships among inputs, 
activities, and intended outcomes. (pp. 70-71) 


Smarter Balanced’s TOA is well articulated in its Race to the Top application (Smarter Balanced, 
2010) and has been excerpted from the application as a separate document available on the SBAC 
website (Smarter Balanced, 2012b). It begins by stating that Smarter Balanced “supports the 
development and implementation of learning and assessment systems to radically reshape the 
education enterprise . . . to improve student outcomes” and states that “the overarching goal of the 
Smarter Balanced Assessment Consortium is to ensure that a// students leave high school prepared 
for postsecondary success in college or a career through increased student learning and improved 
teaching’ (p. 1; emphasis in original). The TOA lists “seven principles undergirding the theory of 
action” (p. 1). These principles are: 


1. Assessments are grounded in a thoughtful, standards-based curriculum and are managed as 
part of an integrated system of standards, curriculum, assessment, instruction, and teacher 
development. 


2. Assessments produce evidence of student performance on challenging tasks that evaluate the 
Common Core State Standards. 


3. Teachers are integrally involved in the development and scoring of assessments. 


The development and implementation of the assessment system is a state-led effort with a 
transparent and inclusive governance structure. 


Assessments are structured to continuously improve teaching and learning. 


Assessment, reporting, and accountability systems provide useful information on multiple 
measures that is educative for all stakeholders. 


¢. Design and implementation strategies adhere to established professional standards. (Smarter 
Balanced, 2010, pp. 32-33) 


From these principles we can immediately infer that intended goals of Smarter Balanced are to 
develop quality assessments that are aligned with the CCSS, are part of a system that supports 
instruction and student learning, and provide results that are useful for evaluating student 
performance. It is also clear that other goals are to involve teachers throughout the test development 
and scoring processes and to operate as a true collaborative with states working in unison toward 
these common goals. 


The model that Smarter Balanced established to meet these goals involves three different 
components: (a) Summative assessments, (b) interim-benchmark assessments, and (c) formative 
assessment resources. A schematic representation of the Smarter Balanced TOA is illustrated in 
Figure A-1, which is taken directly from the Smarter Balanced Race to the Top application (Smarter 
Balanced, 2010). This representation includes the three assessment components, but also 
illustrates the other components that are required for the Consortium members to work together in 
unison and to reach the “overarching goal” found on the right side of the figure. Related to the 
Theory of Action are the overall and specific claims for the summative assessments, which are 
presented in Table A-1. 


83 


Figure A-1. Overview of Smarter Balanced Assessment Consortium Theory of Action 


Source. Smarter Balanced (20126). 


84 


Table A-1. Overall and Specific Claims for Smarter Balanced Summative Assessments 


Claim Type 


Overall: Grades 3-8 


Overall: Grade 11 


Specific 


ELA: Students can... 


demonstrate progress toward college and 
career readiness in English language arts 


and literacy. 


demonstrate college and career readiness 


in English language arts and literacy. 


read closely and analytically to 
comprehend a range of increasingly 


complex literary and informational texts. 


produce effective and well-grounded 
writing for a range of purposes and 
audiences. 


employ effective speaking and listening 
skills for a range of purposes and 
audiences. 


engage in research and inquiry to 
investigate topics, and to analyze, 
integrate, and present information. 


85 


Mathematics: Students can... 


demonstrate progress toward college 
and career readiness in mathematics. 


demonstrate college and career 
readiness in mathematics. 


explain and apply mathematical 
concepts and interpret and carry out 
mathematical procedures with 
precision and fluency. 


solve a range of complex, well-posed 
problems in pure and applied 
mathematics, making productive use 
of knowledge and problem-solving 
strategies. 


clearly and precisely construct viable 
arguments to support their own 
reasoning and to critique the 
reasoning of others. 


analyze complex, real-world scenarios 
and construct and use mathematical 
models to interpret and solve 
problems. 


Appendix B: Description of Alignment Methods 


PAN Fedalaatsyane 
Model 


DY Taateyarsiceyal Brief Description 


* Categorical Concurrence Match of items to general content areas 


Cognitive level of items compared to cognitive 
level of benchmark/objective 


**Depth of Knowledge Consistency 


Webb (1997) 


Number of benchmarks/objectives measured 


**Range of Knowledge Correspondence within general content area 


Distribution of items across general content 


**Balance of Representation erase 


Congruence between item and 


. . 
Content Centrality objective/benchmark 


Congruence between cognitive demand of item 
and objective/benchmark 


** Source of Challenge Grade-level appropriateness 
‘ a oa ; 
Achieve (2006) Level of Cognitive Demand Cognitive level measured by item 


Degree to which test captures difficulty implied 
by general content areas 


Holistic evaluation of how well test represents 
**Balance ive 
content/cognitive specs 
**Range Proportion of objectives/benchmarks 
measured within general content area 


Match of items to content areas and cognitive 
levels 


* Performance Centrality 


**Level of Challenge 


*Content Match 


SEC (Porter et 


Compares cognitive demands of curriculum 


al., 2001) and assessment 


**Expectations for Student Performance 


**|Instructional Content Compares what is taught with what is tested 


*Covered or partially covered by one or more traditional content validation approaches. 
** Unique contribution of alignment method. 
From Sireci & Schweid (2011). 


86 


Appendix C: Description of Item Similarity Rating Approach to Evaluating Test Content 


As stated earlier, a disadvantage of this approach to blueprint confirmation is that it may foster 
social desirability—that is, by informing SMEs of the intended CCSS measured by each item, it may 
unconsciously bias their ratings in support of item/standard congruence. To avoid this potential 
confound, and to determine whether other relations among the items are present that are not 
described in the test specifications, the item similarity rating task described earlier could be 
conducted. An example of this task is presented in Figure C-1. An example of some of the results 
from this type of study (from Sireci, Robin, Meara, Rogers, & Swaminathan, 2000) is presented in 
Figure C-2. These results could be followed up by cluster analyses, to see if the items cluster as 
intended by the test specifications. 


Given that the item similarity rating task requires more SME time and more complex data analysis, 
we recommend that all items be rated for congruence using an alignment-type rating task similar to 
that illustrated in Exhibit 1. However, the similarity rating procedure provides a more stringent test 
and protects against confirmationist bias (Social desirability), and so should be considered as a 
Supplementary study, perhaps using a subset of items. 


Figure C-1. Example of Item Similarity Rating Task 


Directions: Please review each pair of items and rate how similar the two items are to one another in terms of 
the mathematics knowledge and skills measured using the rating scale provided. 


45025 se | Cee 
Five swimmers compete in the 50-meter race. The finish time for each swimmer is shown in the video 
How many two-eyed space creatures are needed to make a group with 24 total eyes? 
The two-eyed space creatures, three-eyed space creatures, and _ a 
four-eyed space creatures are having a contest fo create a group 
with 24 total eyes. | ————_____! 
i 2 3 
4 5 6 
7 8 9 
0 
Explain how the results of the race would change if the race used a clock that rounded to the nearest tenth ’ a ~ 7 = . Delete 
Very Similar Very Different 


S/ 


Figure C-2. Example of Results from Item Similarity Ratings Study 


2.0 
1.5 


1.0 


0.0 


Dimension 2 {Item Format) 


-1.0 


Dimension 1 (Conceptual Understanding} 


C=Concept. Understand., P=Pract. Reason., S=Sci. Investig. 


Source: Sireci et al., 2000. 


88 


Appendix D: Description of ResidPlots2: IRT Residual Analysis Software 


ResidPlots-2 (Liang, Han, & Hambleton, 2008, 2009) is a software program for evaluating the fit of 
item response theory (IRT) models to data. By comparing observations to model-predicted 
expectations, ResidPlots-2 works at the item level to provide researchers with information to 
determine how well an IRT model fits a given data set. The approach used in ResidPlots-2 is to first 
compute model fit statistics using the observed data, and then also use item and ability estimates 
from IRT estimation programs, such as BILOG-MG, PARSCALE, and MULTILOG, to simulate examinee 
response data and report the average from 10 replications of the simulation. Thus, simulation 
results obtained in this way better approximate the expected observed test score distribution. 


The output from ResidPlots-2 takes the forms of both graphs and tables. Plots generated by 
ResidPlots-2 include: 


Item-level plots (raw residual plots, standardized residual plots), 

Test-level plots (standardized residual distributions [both cumulative density function {CDF} and 
probability density function {PDF}], ttem and score fit plots from empirical and simulated data); 
and 

Score plots (observed and predicted test score distributions). 


ResidPlots-2 also generates six tables of results: 


The F/T STAT table provides results for two fit statistics at the item level (chi square, G square) as 
well as degree of freedom and fit probability for both, and basic item details (item number, 
parameter estimates, and sample size). 

The SR PDF table lists details of the standardized residual (SR) distribution for the PDF, with 
mean, standard deviation, and relative frequency of the SR distribution. These results are 
provided for the overall test and broken out by format (dichotomous and polytomous items) and 
for both observed and simulated data. 

The SA CDF table is a companion table to the SR PDF table; here, the results are provided for the 
CDF. 

The NCOUNT table displays the characteristics of the sample (Sample size and percentage) in 
each reported interval for each item. This is an important feature, as users can make application- 
specific choices about interval width and score ranges in ResidPlots-2. 

The PF/7 table provides the results of the Lz person fit statistic for each person in the sample. 
Note that this report lists the probability values for each person, where values below 0.05 are 
indicative of person misfit. 

The P_A/SE table contains results for the root integrated square error statistic (RISE), which is a 
nonparametric fit statistic. As with the PFIT table, results are shown in terms of probability values 
for each item, where values less than 0.05 are indicative of nonparametric item misfit. 


The plots in Figures D-1 and D-2 are samples of output from ResidPlots-2 that depict the item-fit plot. 
Note that the 3P model was fit to the data for Figure D-1, while a 1P model was fit to the same data 
for Figure D-2. Figure D-2 illustrates that results from the observed calibration are much more 
disparate from the simulated results than the results shown in Figure D-1, which suggests that the 
3P model provides better model-data fit than the 1P model for the data. 


89 


Figure D-1. ResidPlots-2 Item Fit Plot (data fit by 3P) 


ResidPlots - Plot Dialog 


Shy 


Trpy.ce 20 1 


a 


30 


Pa oe ee SY ee ea ee ie 20 ee ee! ere aoe! a2) ad 22 ae 
lier Murnber 


© SR Distributions © Standardized Residual item) | 
© DataModel Fit (Both =v | Item v) © RawResidual ate | 


—! 


©) Score Distribution Include the Title | | Aggregate Categories fonly for Poly.) 


Figure D-2. ResidPlots-2 Item Fit Plot (data fit by 1P) 
hel ResidPlots - Plot Dialog Bela 


ltem Fit 


Obsenred 


Simulated 


14 3 4 5 g fF 9g 8 ap tl qa 1344 15 16 1% 49 19 99 41 99 25 a4 25 ag 2P ag 24 ag 31 a9 33 a4 35 ap 
ltem Murnber 


© SR Distributions | © Standardized Residual tem) | 


® DateModel Fit | Both v | ltem v © Raw Residual 7) )) LSRcpe | jeRise Save All 
©) Score Distribution Include the Title | | Aggregate Categories (only for Poly.) 


90 


Appendix F-— Cognitive Lab Final Report 


Page 24 of 39 


Smarter 
Balanced 


Assessment Consortium 


Smarter Balanced 
Assessment Consortium: 


Cognitive Laboratories Technical Report 


Developed by: The American Institutes for Research 
September 27, 2013 


Executive Summary 


The Smarter Balanced Assessment Consortium conducted cognitive laboratories to better 
understand how students solve various types of items. A cognitive laboratory uses a think-aloud 
methodology in which students speak their thoughts while solving a test item. The interviewer follows 
a standardized protocol to elicit responses and record what a student says. While this one-on-one 
process is time consuming, the type of information elicited is often difficult to obtain by other means. 
This report presents the results of a series of cognitive laboratory observational studies. The studies 
were conducted with small numbers of students in order to gather in-depth qualitative data about 
how students react to different types of items, formats, etc. Due to the small number of subjects 
studied and the ad hoc nature of the achieved sample of participants, the findings should be used to 
point the way to more systematic studies, rather than be cited as an authoritative source of scientific 
findings. 


This executive summary presents the major findings from various protocols. Most protocols were 
developed at multiple grade bands (e.g., 3, 6, and 11). A grade band is the level of content for which 
the protocol is targeted. Protocols were usually targeted to answer a specific question in one or more 
content areas (e.g., ELA, mathematics). Results are organized under topics or questions of interest. 


Summary and Findings of Cognitive Lab Results by Research Question 


Research Question 1: Do mathematics multi-part selected-response (WPSR) items provide similar 
information about the depth of understanding by the test taker similar to traditional constructed- 
response (CR) items? 


An MPSR item has students select several examples of a correct response rather than just one, as in 
the typical selected-response (SR) item. The intention of this research question was to see whether 
the MPSR items provided depth of understanding similar to that provided by CR items. If effective, an 
MPSR item would be a more efficient way to measure the content measured by CR items. Within a 
form, parallel items were constructed in both formats and presented to the same students. In the 
protocols the MPSR and CR items were presented in random order. 


This research question sought to address two hypotheses. The first hypothesis examined whether 
students who get full credit on MPSR items reveal, through their think-aloud sessions, greater 
understanding than those students who do not achieve full credit. The second hypothesis examined 
whether students who get full credit on MPSR items reveal depth of understanding similar to that of 
students who get full credit on similarly challenging CR items measuring the same target. 


In most cases, the depth of knowledge (DOK) demonstrated by the student for the MPSR items 
either equaled or exceeded the DOK demonstrated for the CR items. Students who got full credit on 
the MPSR items also revealed greater understanding of the material than those who did not obtain 
full credit. The percentage of students understanding the material was also quite similar for the 


MPSR and CR items. A typical interviewer comment was, “based on the accuracy of the student’s 
responses to both types of items, it appears that item type is not a factor in determining how well the 
students respond[s].” 


Research Question 2: Under what conditions do specific types of TE items (and SR items) approach 
the depth of knowledge (DOK) of a written constructed response in ELA and mathematics? 


This question was designed to assess whether different types of technology-enhanced (TE) items 
approach the DOK of CR items for specific content claim/targets and DOK levels. SR items were also 
included, where available, as a comparison item format. Comparisons were examined for specific TE 
item types at specific DOK levels for specific content claims/targets. CR and SR items were matched 
to specific content claims/targets and DOK 4 items in one of the three formats (SR, TE, and CR) 
appeared in each form. Multiple forms were administered, each form to a different sample of 
students. It was hypothesized that students responding to items of a specific type would reveal that 
they were using thought processes consistent with a specific DOK level for items measuring a 
specific target. Different item types were administered to different students. 


For ELA, a higher percentage of students demonstrated thought processes consistent with the 
specific DOK levels for most of the TE item types than for the attached CR items. Two exceptions 
were two targets in the “select text” item type: “justifying interpretations” (grade band 6) and 
“analyzing the figurative” (grade band 11). Asimilar pattern was observed for the matched SR items 
versus the CR items. 


Regarding student performance on the ELA items, the pattern of results were very similar to that 
observed for the DOK consistency-of-thought processes. The same TE item types had higher 
percentages of students receiving the maximum score than did the matched CR items with the 
exception of the “select text” items for the “writing or revising strategies” target (grade band 7) and 
the “citing to Support inferences” target (grade band 11). 


For the SR items in ELA, the percentage receiving the maximum score was higher than both the CR 
and TE formats for the following “select text” items: 


e “select text” for justifying interpretations, claim 1, DOK 2 in grade band 6 
e “select text” for citing to support inferences, claim 1, DOK 2 in grade band 11 
e “select text” for analyzing the figurative, claim 1, DOK 2 in grade band 11 


For mathematics, the results were more varied. Compared to the matched CR items, the following 
TE item types had a higher percentage of students demonstrating thought processes consistent with 
the DOK level. 


e “placing points” for fractions, claim 1, DOK 2 in grade band 3 

e “single lines” for equations and inequalities, claim 1, DOK 2 in grade band 11 

e “tiling” for fractions, claim 1, DOK 2 in grade band 3 

e “tiling” for equations and inequalities, claim 1, DOK 2 in grade band 11 (“Student indicated 
use of multiple steps and solved correctly.”) 

e “vertex-base quadrilaterals” for lines, angles, and shapes, claim 4, DOK 3 in grade band 4 


The item types in which the CR items had a higher percentage of DOK-consistent thought processes 
included: 


e “select and order” for apply arithmetic to algebraic expressions, claim 1, DOK 2 in grade 
band 6 

e “tiling” for everyday mathematic problems, claim 4, DOK 3 in grade band 4 

e “tiling” for apply arithmetic to algebraic expressions, claim 1, DOK 2 in grade band 6 

e “tiling” for everyday mathematic problems, claim 2, DOK 3 in grade band 11 

e “vertex-base quadrilaterals” for lines, angles, and shapes, claim 1, DOK 2 in grade band 4 


The TE item types for which a higher percentage of students received full credit included only: 


e “tiling” for equations and inequalities, claim 1, DOK 2 in grade band 11, and 
e “vertex-base quadrilaterals” for lines, angles, and shapes, claim 1, DOK 2 in grade band 4. 


In other cases, the percentage of students receiving full credit was lower for the TE item types than 
for the comparable CR items. It should be noted that the percentage receiving full credit was 
generally low in mathematics for all three item formats. Even the matched SR items generally did not 
perform any better than either the CR or TE items. 


Research Question 3: For multi-part selected response (MWPSR) items where students may select 
more than one answer choice, which wording best indicates to the student that he or she Is allowed 
to select more than one option? For multipart dichotomous-choice (e.g., YES/NO) items, do students 
know that they need to answer each part? 


Smarter Balanced sought to investigate whether students might become confused with MPSR items 
in mathematics and perhaps not complete the entire item. In order to investigate this, items were 
constructed with different amounts of labeling. Labeling is the identification of the parts of the 
problem with indicators such as “a,” “b,” “c” or “1,” “2,” “3.” For each MPSR item, labeled and non- 
labeled condition were investigated. An example of an item in labeled and non-labeled format can 
be found in Exhibit 1. 


This question was designed to assess whether labeling or not labeling an MPSR mathematics item 
produces a difference in performance. Forms were constructed for five grade bands, with each form 
containing one MPSR item followed by one CR item. The labeled and non-labeled items appeared in 
different forms of the test and thus were taken by different students. 


Even though the labeling of MPSR items was intended to clarify the mathematic tasks for the 
students, in many cases it actually seemed to confuse the students. Little difference was observed 
between the labeled and non-labeled items in the lower grade bands (grade bands 3-6). However, 
students in grade band / tended to score higher with non-labeled items. Also, students in grade 
bands 7 and 11 tended to be confused by the labeling. In addition, the labeled items tended to 
receive more comments related to not understanding the instructions. The interviewer confirmed this, 
Suggesting that the grade bands 7 and 11 students better understood the instructions in the non- 
labeled condition than in the labeled condition. 


Research Question 4: Does the ability to move one or more sentences to different positions provide 
evidence of students’ ability to revise text appropriately in the consideration of chronology, 
coherence, transitions, or the author’s craft? 


Smarter Balanced is considering using ELA items that have students reorder sentences to measure 
an editing/revising standard. Claim 2 of the ELA standards states that students should be able to 
revise one or more paragraphs demonstrating specific narrative strategies (use of dialogue, sensory 
or concrete details, description), chronology, appropriate transitional strategies for coherence, or 
authors’ craft appropriate to the purpose of the item (closure, detailing characters, plot, setting, or 
an event). 


This question was designed to assess whether students’ movement of one or more sentences to 
different positions provided evidence of students’ ability to demonstrate consideration of chronology, 
coherence, transitions, or author’s craft. Six ELA items were included in a test form. 


Students who performed well on the items were more likely to consider the targeted writing skills 
(e.g., chronology, coherence, transitions, and author's craft) when answering the questions. The 
results showed that students who made more appropriate sentence moves (and fewer inappropriate 
moves) were more likely to consider the writing skills of chronology, coherence, and transitions. The 
pattern was less clear for consideration of author’s craft. 


Research Question 5: Do students who construct text reveal more understanding of targeted writing 
Skills than students who manipulate writing through the manipulation of text (MT) tasks? 


Many believe that the best way to measure writing is to have students write. However, in a testing 
environment, it is often difficult to adequately sample the writing content domain with an 
assessment composed exclusively of CR items. An effort is ongoing to find items that are efficient 
but that can adequately measure the components of the writing domain, thus allowing a broader 
selection and greater number of items to be delivered. This question examined whether students 
responding to MT tasks would demonstrate understanding of the targeted writing skills comparable 
to the understanding demonstrated for CR tasks assessing the same claim and target. Examples of 
the item types can be found in Exhibit 2. 


For each of three grade bands (3, 6, and 11), four pairs of ELA items were developed. Two forms 
were created for each grade band, and each pair contained one MT item and one CR version of the 
same item. Two forms were created, and each form contained a single version of an item. Each form 
contained two MT items and two CR items. The MT items were almost exclusively “select and order” 
items, though two items—one in grade band 3 and one in grade band 11—were “reorder text” items. 
All items assessed claim 1, target 1. 


The results showed that the targeted writing skills are considered by students who manipulate text at 
a level comparable to (or greater than) that encountered when they are constructing text. The 
students in grade bands 3 and 6 showed comparable (or greater) levels of understanding when the 
items were in an MT format. For the grade band 11 students, the results were mixed, but students 
tended to be more effective in applying the targeted writing skills in the CR format, particularly for 
transitions and author’s craft. Score distributions were comparable for MT and CR item formats. 


Research Question 6: Do different types of directions (minimal, concise, or extensive) have an effect 
on the performance of technology enhanced (TE) items in ELA and Mathematics? 


The optimal amount of direction that should be given to a student working with TE items is unclear. 
With minimal directions, students may not know how to approach an item; with extensive directions, 
students may be distracted or slowed to a point where the item becomes inefficient. This may be 
particularly true with elementary school students, who may take longer to process text. This question 
examined this issue for ELA and mathematics items. Three types of directions were used (minimal, 
concise, and extensive). 


In most cases in ELA, the level of instruction did not make a difference. For most grade bands and 
item types, neither the level of instruction nor the item type showed a differential effect in ELA. Cases 
in which differences were observed included “select text” items when the directions were “concise.” 
With the “reorder text” items, the grade band 3 students did less well with minimal directions. The 
grade band 11 students also had some difficulty with the “reorder text” ttems when the directions 
were “extensive.” 


In mathematics, the level of instruction also did not make a difference for many item types and 
grade bands. “Select and order” items were difficult (grade bands 6 and 11) regardless of the 
direction type;however, no direction type proved better than another. High percentages of students 
received full credit on “select defined partition” and “straight lines” items; however, the direction 
type did not make a difference. Finally, “tiling” items were generally difficult, but no benefit was 
shown for different types of directions. Differences were observed in items including “placing points” 
items under the minimal and concise directions in grade band 11; however, under extensive 
directions, all students received the maximum score. With “placing points and tiling” items, a higher 
percentage of students received full credit with fewer instructions (grade band 6). Finally, “vertex- 
based quadrilateral” items seemed to benefit from minimal directions in grade band 11. 


When asked if they had difficulty using the computer, ELA students, in grade band 3, under minimal 
directions, said they had trouble with both “select text” and “reorder text” items. The ELA grade 
band 11 students also seemed to have some difficulty with the “reorder text” items. Since these 
difficulties were related to specific item types, the results suggest that there was uncertainty about 
how to perform the task, rather than uncertainty about using the computer itself. Mathematics 
students did not seem to have any problems using the computer. 


Research Question 7: Smarter currently intends to administer the passage first, and then administer 
the items one item at a time. Does this affect student performance? 


Smarter Balanced is interested in the possibility of administering items adaptively within a passage. 
This would require administering items sequentially so that the ability estimate could be updated 
after each item. Presenting items one at a time may take longer, and students may object to not 
knowing what is coming next. This question is designed to assess whether administering an item set 
takes longer when the items are presented sequentially and whether there is a difference in 
confusion or frustration level when students are presented a passage and all the items together or 
are presented a passage with the items then being presented one at a time. The item sets were not 
administered adaptively. 


Two sets of items were created for a given test form. Both sets contained passages of equivalent 
length and difficulty as well as items of equivalent difficulty.1 The first set in a form presented the 
passage with all the items together. The second set presented the passage with the items presented 
one ata time. 


The forms were administered, within grade band, to different samples of students. Each sample 
contained both a general education group (Gen Ed) and a group that received English language 
accommodations (ELL) students. One sample was timed without thinking aloud during the 
administration. Each item set in these forms was separately timed. This sample provided timing 
information only. The second sample involved thinking aloud while responding to the questions and 
was not timed. 


The primary questions of interest were: 


1. Does presenting the items individually after the passage appear to take longer (timed condition)? 
2. Does presenting the items individually after the passage increase the student’s negative 
emotional states (e.g., frustration, confusion; think-aloud condition)? 

3. Do students prefer one approach or another (think-aloud condition)? 


The time it took to complete the sets when all items were presented together or one at a time varied 
by grade band and sample. For the grade band 3 and grade band 11 samples, timing differed little 
whether the items were presented at once or one at a time. However, for grade band 6, presenting 
the items one at a time took substantially longer for both the Gen Ed and ELL samples. While there is 
some variability between the ELL and the Gen Ed samples, the differences are not large and show 
the same pattern within grade band. 


There appears to be slightly more confusion for both the Gen Ed and the ELL samples in grade 

band 3 when all the items are presented together. However, similar frustration levels were observed 
under the two formats for the grade band 3 students. Students working on the grade band 6 ELL 
sample showed similar patterns of frustration and confusion in both presentation formats. However, 
the Gen Ed grade band 6 students showed slightly more confusion when the items were presented 
one ata time. 


; Comparable passage difficulty was achieved through the use of readability and lexile measures. Comparable item 
difficulty was achieved through DOK measures. 


The grade band 6 students tended to score higher when the items were presented all at once (for 
both the Gen Ed students and the ELL students). The grade band 3 students showed similar results, 
regardless of sample or administration format. The grade band 11 Gen Ed students scored higher 
when the items were presented one at a time, while the grade band 11 ELL sample students scored 
higher when the items were presented altogether. 


Both the ELL and Gen Ed grade band 3 students preferred to have the items presented one at a time. 
Grade band 11 students had a slight bias toward having the items presented one at a time. 
Conversely, grade band 6 students preferred to have the items presented together. 


Research Question 8: Smarter intends to present relatively long passages. Do longer passages 
reduce student engagement? 


Smarter Balanced is interested in using passages that are longer than those presently used. The 
Smarter Balanced recommended passage lengths are: for grades 3-5: 450-562 words for short 
passages and 563-750 words for long passages; for grades 6-8: 650-712 words for short 
passages and 713-950 words for long passages; and for high school, 800-825 words for short 
passages and 826-1100 words for long passages. There is concern that the longer passages may 
tax the processing abilities of ELL students and students with disabilities (SWD). 


This question is designed to assess whether longer passages reduce student engagement, hamper 
the completion of the longer passages, or affect the depth of processing of the passage. Two sets of 
items were created. Both sets contained passages of equivalent difficulty with four items of 
equivalent difficulty attached to each passage. Both sets present the passage and all the items 
together. Each form contained a standard-length passage and an extended-length passage. The first 
set contained a passage of standard length. The second set contained a passage that is longer than 
standard length (extended-length, the length equivalent to that intended for use by Smarter 
Balanced). 


The design was intended to compare the performance of two groups of students—ELL/SWD and Gen 
Ed students—across three grade bands: 3, 6, and 11. Twelve students took the forms. Of these, nine 
were grade band 3 Gen Ed students and one grade band 3 student was classified ELL/SWD. The 
Single grade band 6 student was an ELL/SWD student. The two grade band 11 students were Gen 
Ed students. 


All the ELL/SWD students were unaffected by the use of the longer passage. They were able to read 
the entire passage regardless of passage length and demonstrated that the longer passage was 
processed at a deep level. The ELL/SWD students also were not bored or distracted while reading 
either passage. 


On the contrary, Gen Ed students did appear to be affected by the longer passage in grade bands 3 
and 11. About 75 percent of the grade band 3 students and all of the grade band 11 students were 
affected by the use of the longer passage. Only 43 percent of the grade band 3 Gen Ed students and 
50 percent of the grade band 11 Gen Ed students demonstrated a level of deep processing. Also, 
some percentage of the Gen Ed students were bored, regardless of the length of the passage 


Research Question 9: How long does it take for students to read through complex texts, 
performance tasks, etc.? Is timing affected by the way students are presented the passage and 
items? 


One way of making items more difficult is to increase their complexity. Complex items often take 
longer to solve or answer. In computer adaptive tests, added complexity may decrease the time a 
high ability student has to complete the test if the items are made more difficult through increased 
complexity. This potentially creates some fairness issues in an adaptive test if there is a time limit on 
the test. This question was designed to assess the time it takes for students to answer complex and 
simpler items. Complexity was defined as a function of the DOK demanded by the test question. It 
was hypothesized that more complex tasks would take more time. 


Each ELA form had six items. These items varied in item complexity (simple or complex) and item 
format (SR, TE, or CR). The TE items were all “hot text” (HT) items. These items require the student to 
either highlight the text or drag the text to answer the item. 


Forms were constructed in ELA at two grade bands: grade band 3-5 (referred to as grade band 3) 
and grade band 6 and / (referred to as grade band 6). Two forms were administered in grade band 3. 
One form was administered in grade band 6. 


lt was hypothesized that more complex items would take longer to complete than simpler items, but 
no evidence was found to support this hypothesis. SR items were answered in the shortest time. HT 
items took about one minute longer than SR items. CR items took the most time to answer, about 
/5 seconds longer then the hot text items. 


Research Question 10: Working mathematics problems on computer: Communicating mathematics 
on computer—feasibility of measuring student understanding of items for Claims 2-4 on computer. 


With paper tests some students write in their test books while working out mathematics problems. 
When mathematics items are presented on computer, scratch paper is often provided if students 
want to transfer the problem to paper and work it out there. Because scratch paper is often 
destroyed after an online testing session, the degree to which scratch paper is used is not known; 
neither is the importance of scratch paper in working out a problem (or potentially for use in scoring). 
This research question examines the need for paper when solving mathematics problems. 


Each student was presented with three grade-appropriate items. The interviewer recorded whether 
the student made a comment, and the nature of the comment, while working the mathematics 
problems. The students first tried to work a problem without paper. Scratch paper was then offered 
to the student to rework the problem, if desired. The interviewer noted whether students chose to 
add anything additional and noted the nature of the addition (more text, equations, graphics). Note 
that there were only three comments for the third item in the lowest grade band, 3. 


The general conclusion is that a subset of students benefit from being able to work mathematics 
problems on paper. This appears to be especially important when students are beginning to learn 
algebra concepts. 


Grade band 3 students did not need paper to work the problems. However, in the grade band 6 and 
grade band 7 groups, 30-42 percent indicated they wanted to write an equation. In grade bands 6, 
f, and 11, the additional information recorded on paper would have improved the response 
according to the rubric. Responses for specific items in grade bands 6 and 11 were improved by 

15 percent of the students, and responses for all items in grade band 7 were improved when 
information on the scratch paper was taken into account. Improvement for this group ranged 
between 10 and 20 percent of the responses. (“Confused me, | didn’t know how to write an 
equation.” “Tried the keypad, but it wouldn’t work.” “It was much easier with paper.”) This was 
Supported by interviewer observations. About 5-10 percent of students in each grade band found 
the online system difficult to use, but few specifics were recorded. 


77 th 


Research Question 11: Usability of equation editor tool—can students use the too! the way It Is 
meant to be used? 


Although students begin to use technology at a very early age, it is prudent to verify that young 
students are able to use the assessment interface to be used during testing. This question sought to 
evaluate the ability of grades 3-5 students to use the equation editor tool to be included in the 
Smarter Balanced delivery system. Three mathematics items were presented to the students (N=33). 
The first item only required the student to copy his or her response. The second item was a simple 
mathematics item, and the third item was a more challenging mathematics item. The first item 
would demonstrate whether the student could use the equation editor tool. The second and third 
items would provide evidence of whether the ability to use the tool interacted with item difficulty. 


Elementary students had some difficulty using the equation editor. Between 15 and 30 percent of 
the students indicated that they had difficulty using the equation editor. The examiner’s assessment 
concurred that about 35 percent of students had difficulty using the equation editor and that about 
50 percent of the students would get a given item correct. 


Research Question 12: Can students compare the size of a product to the size of one factor, on the 
basis of the size of the other factor, without performing the indicated multiplication? 


This question is designed to assess whether students with a strong understanding of fractions and 
the multiplication and division of fractions complete the items without performing the indicated 
multiplication. The task asked students to compare the size of a product to the size of one factor, on 
the basis of the size of the other factor, without performing the indicated multiplication. Also of 
interest was whether students who complete an item as intended (without using multiplication) 
spent less time on an item than those who did not. To investigate this question a single form was 
administered for grades 3-5. 


There seemed to be little relationship between whether a student has a strong understanding of the 
multiplication and division of fractions and whether he or she used multiplication to solve the items. 
However, students who did not need to perform the multiplication completed the items in less time 
than students who had to perform the multiplication. While most students said they understood the 
questions, 7/O percent had to use multiplication to solve them. Only about 40 percent of the students 
had a firm understanding of the multiplication/division of fractions, according to the interviewers. 


10 


Research Question 13: Contextual glossaries are item-specific glossaries that provide a definition of 
a word that is targeted to, and appropriate for, the context in which the word is used in the item. Are 
these a fair and appropriate way to support students who need language support? 


This question addressed the efficacy of the use of contextual glossaries with non-native speakers 
when solving mathematics problems. Two sets of items were created that were parallel in difficulty. 
The first set of items contained no contextual glossaries with only single words translated. The 
second set of items contained contextual glossaries. The interviewer was asked to determine 
whether the student was having trouble understanding a word and whether the contextual glossary 
aided in the interpretation of the word or sentence. 


Only three ELL students participated: one from grade 3 and two from grade 6. 


The contextual glossaries appeared to be somewhat effective, but the impact was not always 
reflected in the score the student received for an item. The contextual glossaries appeared to be 
incomplete in that they did not include words the student needed. This limited the use of the 
glossaries in these situations. Interviewer’s comments suggested that performance was improved 
when the students used the contextual glossaries. 


Research Question 14: Under what conditions does the use of text-to-speech (TTS) help students 
with lower reading ability focus on content in ELA and mathematics? 


TTS can provide access to an assessment for students with low reading ability. In order for this 
technology to be effective the language produced from the voice-pack must be clear enough to be 
understood. This is particularly true for non-native speakers of English. 


Only students familiar with TTS were included in the study. Overall, 77 students used TTS at least 
once. Among them, 58 students were limited English proficient (LEP), 13 students had reading 
difficulties (IEP), and six were Gen Ed students. 


In ELA four forms were administered with both high- and low-quality voice-packs. In mathematics, 
two forms were administered in grade bands 3 and 11. Only a single form was administered in grade 
band 6. The mathematics forms were only administered with high-quality voice-packs. 


TTS improved access in ELA regardless of the quality of the voice-pack. Greater access was achieved 
when high-quality voice-packs were used. LEP students and students with reading difficulties tended 
to benefit more from the use of TTS. Using TTS with high-quality voice-packs improved focus on 
content in ELA. The use of TTS with low-quality voice-packs tended to distract students in ELA, 
whereas high-quality voice-packs did not. In mathematics, access was improved only for grade 

band 3 students. All Gen Ed, IEP, and grade band 6 LEP students found the high-quality voice-pack 
distracting. This was in part a function of trying to describe a table verbally. 


11 


Introduction 


Smarter Balanced has conducted cognitive laboratories to better understand how students solve 
items in different formats. A cognitive laboratory uses a think-aloud methodology in which students 
speak their thoughts while solving a test item. The interviewer follows a standardized protocol to 
elicit responses and record what a student says. While this one-on-one process is time consuming, 
the type of information elicited is often difficult to obtain by other means. Due to the nature of the 
process the sample sizes are often small; however, they are sufficient to detect large effects. In 
addition, because each student’s comments are recorded, smaller, non-primary effects may be 
brought to light. Most protocols were developed at multiple grade bands (e.g., 3, 6, and 11). A grade 
band is the level of content for which the protocol is targeted. 


What follows are in-depth analyses for each research question outlined in the executive summary. 
Because of the differences in the samples, study design, and questions asked, each research 
question result is presented separately. A summary of the findings for each research question is 
provided at the end of each research question section. Research questions have been organized into 
sections of similar content to improve integration of the material. Finally, a conclusions section 
appears at the end of the document. The overall demographics for the cognitive labs sample can be 
found in Appendix B. 


Processing Selected-Response (SR), Technology-Enhanced (TE), and 
Constructed-Response (CR) Items 


Research Question 1: Do mathematics multi-part selected-response (WPSR) items provide similar 
information about the depth of understanding by the test taker as do traditional constructed- 
response (CR) Items? 


An MPSR item has students select several examples of a correct response rather than just one, as in 
the typical SR item. The intention of this research question was to see whether the MPSR items 
provided depth of understanding similar to that of CR items. If effective, an MPSR item would be a 
more efficient way to measure the content measured by CR items. Also of interest was whether 
similar results would be obtained at different educational levels. To investigate these questions, 
forms were constructed at four grade bands: grades 3-4 (referred to as grade band 3), grades 6-7 
(referred to as grade band 6), grades /-8 (referred to as grade band 7), and grades 9-10 (referred 
to as grade band 11). Within a form, parallel items were constructed in both formats and presented 
to the same students. In the protocols, the MPSR and CR items were presented in random order. 


Interviewers were asked to assess the highest level of DOK the student demonstrated during the 
think-aloud session. Table 1 (ELA) and Table 2 (mathematics) show the rubrics the interviewers used 
during this process. 


Two hypotheses related to research question 1 were examined. The first hypothesis examined 
whether students who get full credit on MPSR items reveal, through their think-aloud sessions, 
greater understanding than those students who do not achieve full credit. The second hypothesis 
examined whether students who get full credit on MPSR items reveal understanding similar to that of 
students who get full credit on similarly challenging CR items measuring the same target. 


1 


Table 1. Depth of Knowledge Chart (ELA) 


DOK Level DY=yilaliaceya Types of statements 


Recall and 1. Recalls facts, details, and events 
Reproduction 
2. Uses word relationships (synonym/ antonym) to determine 
meaning 


. Recognizes or retrieves information from tables and charts 
Basic Skills and . summarizes information 
Concepts 

. Identifies central ideas 

. Uses context to determine word meanings 

. Analyzes text structure and organization 

. Compares literary elements, facts, terms, or events 
Strategic . Uses supporting evidence to explain, generalize, or connect 
Thinking and ideas 
Reasoning 


. Analyzes or interprets author’s craft (literary devices, viewpoint, 
potential bias) to critique a text 


. Develops a logical argument and cites evidence 


13 


Table 2. Depth of Knowledge Chart (Mathematics) 


DOK Level DY=yilaliacea Types of statements 


1 Recall and | remembered it. 
Reproduction We learned the answer in class. 
| did what it said. 
| recognized it. 
Basic Skills and 1. Any statement indicating putting two or more pieces of 
Concepts knowledge together 
. An statement indicating that they executed a sequence of steps 
that was not given to them 
. Any inference relating two different things 
. Expression of a hypothesis or guess about a relationship 


Strategic . Any statement indicating that they are applying abstract 
Thinking and concepts to concrete phenomenon, e.g., “Both patterns reflect 
Reasoning exponential growth” 


. Statements indicating that the students evaluated several 
different approaches to solving the problem, accompanied by 
the ability to explain why they selected the solution path they 
chose 


. Explanations of their choices or decisions using data and 
information from multiple sources to construct a coherent and 
logical argument 


14 


Results 


Twenty students were administered the grade band 3 form, 3/7 students were administered the 
grade band 6 form, 31 students were administered the grade band 7 form, and 19 students were 
administered the grade band 11 form. 


Table 3 presents the average DOK demonstrated level by students who received full credit on an 
item for each grade band/target. Table 4 shows the correspondence between the target labels and 
the full target description. Blank cells are the result of incomplete data, either in the score or in the 
demonstrated DOK. In most cases, the DOK the student demonstrated for the MPSR items either 
equals or exceeds the DOK demonstrated for the CR items. Interviewers commonly commented that 
the student did equally well on both item formats. 


Table 3. Average DOK Demonstrated by Students Who Received Full Credit for Paired MPSR and CR 
ltems Measuring the Same Target 


CT r=lel= 
Band Target Item Format Avg. DOK 


a 
Geometric Measurement: Perimeters (J 
MPSR 

suuinae — 
MPSR 

fase ae apse 
MPSR of 

Analyze Proportional Relationships (A . | MPSR 
MPSR 

ccnerate Equivalent Exoreceione (ce) |__MPSR_ 
8 iia aca 
MPSR 

Aonly Arthmeticto Aleenra ie) |__MPSR__ 
icalicial iia ia 


| = MPSR | 
Analyze Proportional Relationships (A) i< <=. 
Generate Equivalent Expressions (C 

SS 
MPSR 

Solve Linear Equations ( | MPSR 
| | Senetineareauetionsio) | NES} — 

11 Equivalent Problem Solving (E) | MPSR 
; mPSR | 

11 Graph Equations and Inequalities (J) | MPSR 
MPSR 

aa carat eunaiienedid | MPSR 
| __ sect Functions) 300 


Table 4. Correspondence Between Target Label and the Full Target Description 


15 


Target Label Full Target Description 

Geometric 

measurement: Geometric measurement: recognize perimeter as an attribute of plane 
Perimeters figures and distinguish between linear and area measures 


Reason with Shapes Reason with shapes and their attributes 


Place Value: Whole 
Converting Units of Solve problems involving measurement and conversion of measurements 
Geometric 


measurement : Geometric measurement: recognize perimeter as an attribute of plane 
Perimeters figures and distinguish between linear and area measures 


One Variable Equations | Reason about and solve one-variable equations and inequalities 


Apply Arithmetic to Apply and extend previous understandings of arithmetic to algebraic 

Generate Equivalent 

Analyze Proportional Analyze proportional relationships and use them to solve real-world and 
Analyze and solve linear equations and pairs of simultaneous linear 

Equivalent Problem 

Graph Equations and 


Use of Functions Understand the concept of a function and use function notation 


16 


The second hypothesis examined whether students who get full credit on the MPSR items reveal 
greater understanding of the material than those who do not obtain full credit. Table 5 presents 
these findings. In all cases those who receive full credit for an item showed greater understanding 
than those who did not receive full credit. The percentage understanding is also quite similar for the 
MPSR and CR items. 


Table 5. Percentage of Students Who Appear to Understand the Material, by Item Type, Grade Band, 
and Whether Full Credit Was Received 


Grade Band 


Non-Full Full Non-Full Full Non-Full Full 
Credit Credit Credit Credit Credit Credit 


Summary 


This research question sought to address two hypotheses. The first hypothesis examined whether 
students who get full credit on MPSR items reveal, through their think-aloud sessions, greater 
understanding than those students who do not achieve full credit. The second hypothesis examined 
whether students who get full credit on MPSR items reveal depth of understanding similar to that of 
students who get full credit on similarly challenging CR items measuring the same target. 


In most cases, the DOK the student demonstrated for the MPSR items either equaled or exceeded 
the DOK demonstrated for the CR items. Students who got full credit on the MPSR items also 
revealed greater understanding of the material than those who did not obtain full credit. The 
percentage of students understanding the material was also quite similar for the MPSR and CR 
items. A typical interviewer comment was, “based on the accuracy of the student’s responses to both 
types of items, it appears that item type is not a factor in determining how well the students 
respond|s].” 


17 


Research Question 2: Under what conditions do specific types of TE items (and SR items) approach 
the depth of knowledge (DOK) of a written constructed response in ELA and mathematics? 


The question was designed to assess whether different types of TE items approach the DOK of CR 
items for specific content claim/targets and DOK levels. SR items were also included, where 
available, as a comparison item format. Comparisons were examined for specific TE item types at 
specific DOK levels for specific content claims/targets (see Appendix A for a full description of the 
claims and targets). Where possible, parallel items were created in each item format at the same 
DOK level and content claim/target; however, some combinations were not available. In ELA, items 
in the different formats were administered for most item type/content target/ DOK combinations. In 
mathematics, however, some item formats were not administered for all claim/target/DOK 
conditions and some data were incomplete. This limited the comparisons that could be made. Four 
items in one of the three formats (MPSR, TE, and CR) appeared in each form. Multiple forms were 
administered, each to a different sample of students. It was hypothesized that students responding 
to items of a specific TE type would reveal that they were using thought processes consistent with a 
specific DOK level for items measuring a specific target. 


Forms were constructed in ELA at five grade bands: grade 3 (referred to as grade band 3), grades 4- 
5 (referred to as grade band 4), grades 6-7 (referred to as grade band 6), grades 7-8 (referred to as 
grade band 7), and grade 11 (referred to as grade band 11). In mathematics, forms were 
constructed at four grade bands: grades 3-4 (referred to as grade band 3), grades 4-5 (referred to 
as grade band 4), grades 6-7 (referred to as grade band 6), and grade 11 (referred to as grade 

band 11). Note that the grade band relates to the level of the material in the assessment and not 
necessarily the grade of the students to which the assessment is administered. A single form was 
administered in each grade band. This was a between-subjects design in which different item types 
were administered to different students. For this question, the comments presented are made by the 
interviewer, aS opposed to the student, due to the nature of the information being captured (e.g., 
DOK level demonstrated). 


18 


Results 


Table 6 shows the sample sizes within a grade band by item format across item types and content 
area. The ELA forms tended to have been administered to larger samples than were the 
mathematics forms. 


Table 6. Sample Sizes Within Grade Band, by Content Area and Item Type 


(CT es\e(sm tel ale| 


Item 
Content Format) 3 4 6 1 11 


aa [te aap faol eae 
EEE 


Tables 7a (ELA) and 7b (Mathematics) list the percentage of students whose thought processes were 
consistent with the DOK level of the items for the respective content areas. For each TE item type, 
the percentage of students who demonstrated thought processes consistent with the grade 
band/content claim and target/DOK was recorded. MPSR and CR items were matched to the same 
grade band/content claim and target/DOK levels. The primary comparison of interest is between the 
TE and CR formats. 


ELA 18 16/13] 8 6 

ELA 2/14;10/; 8 14 

ELA 4;13,13;,15)| 10 
M {| 6 | 23] - 10 


For ELA, students demonstrated a higher DOK level for most of the TE item types than for the 
matched CR items. (“Well thought out. Uses evidence she feels Supports the main idea of the item.”) 
Two exceptions were two targets in the “select text” item type: “justifying interpretations” (grade 
band 6) and “analyzing the figurative” (grade band 11). A pattern similar to that of the TE item types 
was observed for the matched MPSR items versus the CR items. 


19 


Table 7a. Percentage of Students Demonstrating That They Are Using Thought Processes at the 
Specified DOK level, by Item Type, Claim, Target, and DOK Level (ELA) 


% of Students With 
Consistent 
Thought Process 


Drag and Drop ee . 
Tiling) 6 Justifying interpretations 1 


NO 
NO 


Reorder Text 
Reorder Text | 6 | 
Select Text | ei 
Select Text a né text to support 
Select Text 


Select Text 
Select Text 


Writing or revise strategies 
Organizing ideas 
Justify interpretations 


E 
— 
g 
KK 


Writing or revising strategies 
Citing to support inferences 


Analyzing the figurative 


For mathematics, the pattern is less clear. The TE item types that yielded a higher percentage of 
students demonstrating thought processes consistent with the DOK level included: 


“placing points” for fractions, claim 1, DOK 2 in grade band 3 (“This student had a thorough 
understanding of these fractions and how they related to the number line. He thoroughly and 
accurately placed each fraction and explained how/why using various steps.”) 

“single lines” for equations and inequalities, claim 1, DOK 2 in grade band 11 

“tiling” for fractions, claim 1, DOK 2 in grade band 3 (“This student clearly understood and 
explained how to solve this item using multiple methods. He used multiple steps to solve 
each item.”) 

“tiling” for equations and inequalities, claim 1, DOK 2 in grade band 11 (“Student indicated 
use of multiple steps and solved correctly.”) 

“vertex-base quadrilaterals” for lines, angles, and shapes, claim 4, DOK 3 in grade band 4 


For the following TE item types, the percentages of students demonstrating thought processes 
consistent with the DOK level were equal for the TE and CR formats. 


“select and order” for fractions, claim 1, DOK 2 in grade band 3 
“select and order” for fractions, claim 1, DOK 2 in grade band 6 
“selecting points” for fractions, claim 1, DOK 2 in grade band 3 
“single lines” for everyday math problems, claim 2, DOK 2 in grade band 11 


20 


The TE item types for which the matched CR items yielded a higher percentage of students who 
demonstrate consistent thought processes included: 


e “select and order” for apply arithmetic to algebraic expressions, claim 1, DOK 2 in grade 
band 6 

e “tiling” for everyday mathematic problems, claim 4, DOK 3 in grade band 4 

e “tiling” for apply arithmetic to algebraic expressions, claim 1, DOK 2 in grade band 6 (“The 
student was able to explain his answer in multiple steps and with a clear understanding of 
the distributive property.”) 

e “tiling” for everyday mathematic problems, claim 2, DOK 3 in grade band 11 

e “vertex-base quadrilaterals” for lines, angles, and shapes, claim 1, DOK 2 in grade band 4 
(“This student understood right angles. She also understood that she had to name a 
Similarity and a difference.”) 


Table 7b. Percentage of Students Demonstrating That They Are Using Thought Processes at the 
Specified DOK Level, by Item Type, Claim, Target, and DOK Level (Mathematics) 


% of Students 
With 
Consistent 
Thought Process 


Placing Points Fractions pa | 2 [50 53 | o- 
Serta’ 3 Fractions 1 2 53 
Order 
Select and Apply arithmetic to algebraic 1 5 AO 67 79 
Order expressions 
Select and 

fronts | 3 [Fractors, =| a | 2 [0] 8 
Points 


Single Lines Equations and inequalities 


Single Lines Everyday math problems 
1 
1 


= 


Oa ee 


4 
| § |egresion 
E€Xpressions 
/1t [Equations and inequalities 
a 


Tiling 


Everyday math problems L207 
Vertex-Based 


Quadrilaterals Lines, angles, and shapes 
peroneal 4 Lines, angles, and shapes 


1 
1 
1 
1 
A 
4 


21 


Also of interest was how students performed on these item types. Since not all items are 1-point 
items, the percentage obtaining the maximum score was used. Table 8a presents this information 
for ELA; Table 8b presents this information for mathematics. In ELA, the pattern is very similar to the 
consistency of thought process table. The same TE item types had higher percentages of students 
with maximum scores than did the CR items, with the exception of the “select text” items for the 
“writing or revising strategies” target (grade band 7), and the “citing to Support inferences” target 
(grade band 11). 


For the MPSR items in ELA, the percentage receiving the maximum score was higher than both the 
CR and TE formats for the following “select text” items: 


e “select text” for justifying interpretations, claim 1, DOK 2 in grade band 6 
e “select text” for citing to support inferences, claim 1, DOK 2 in grade band 11 
e “select text” for analyzing the figurative, claim 1, DOK 2 in grade band 11 


Table 8a. Percentage of Students Receiving Full Credit for an Item (ELA) 
% of Students 


With Maximum 
MS Yere) =) 


rine | So | 

Drag and Drop ee 

ae od 
| 2 | 64/0 | 44 


Reorder Text Writing or revise strategies | 44 
Reorder Text Organizing ideas )12} | | 
Select Text Justifying interpretations 
Cf 


6 
6 
inferences 


NO 


2 
2 


Select Text Writing or revising strategies 
Select Text Citing to support inferences 
Select Text Analyzing the figurative 


In mathematics, the TE items for which a higher percentage of students received the maximum 
possible score included only: 


e “single lines” for equations and inequalities, claim 1, DOK 2 in grade band 11 
e “tiling” for equations and inequalities, claim 1, DOK 2 in grade band 11 
e “vertex-base quadrilaterals” for lines, angles, and shapes, claim 1, DOK 2 in grade band 4 


22 


For the MPSR items in mathematics, the percentage receiving the maximum score was higher than 
both the CR and TE formats for the following items: 


e “placing points” for fractions, claim 1, DOK 2 in grade band 3 

e “select and order” for fractions, claim 1, DOK 2 in grade band 3 

e “selecting points” for fractions, claim 1, DOK 2 in grade band 3 

e “tiling” for fractions, claim 1, DOK 2 in grade band 3 

e “tiling” for everyday math problems, claim 4, DOK 3 in grade band 4 

e “vertex-base quadrilaterals” for fraction equivalence and ordering, claim 1, DOK 2 in grade 
band 3 


In other cases the percentage receiving the maximum score was lower than for the comparable CR 
items. It should be noted that the percentage receiving the maximum scores was generally low in 
mathematics. 


Table 8b. Percentage of Students Receiving Full Credit for an Item (Mathematics) 
% of Students 


With Maximum 
SYere) =) 


TE Type Grade | Target Claim | DOK | TE | MPSR 
Band 

Select and 
fers [eee ep 
Select and 
Se formes ep 
Select and Apply arithmetic to algebraic 4 

1 2 21 
Order expressions 4 
Se ecne 3 | Fractions 1 2 | 14 | 35 | 2 
Points 5 


Single Lines Everyday math problems ) 2 f2jol, — 
Single Lines pat Equations and inequalities pa f 2 fat 
ee — [ese Ito tt 
[Ting [4 [everyday math probleme EBEsEs pas ts 
expressions 
Vertex-Based 2 
Quadrilaterals Fraction equivalence and ordering 


phe, 


Vertex-Based 


Quadrilaterals Lines, angles, and shapes 


Vern Based ac nes anes, and shapes ae fete fe 


Summary 


For ELA, students demonstrated a higher DOK level for most of the TE item types than for the 
matched CR items. Two exceptions were two targets in the “select text” item type, “justifying 
interpretations” (grade band 6) and “analyzing the figurative” (grade band 11). A similar pattern was 
observed for the matched MPSR items versus the CR items. 


Regarding student performance on the ELA items, the pattern of results is very similar to the 
consistency of thought process table. The same TE item types had higher percentages than did the 
CR items, with the exception of the “select text” items for the “writing or revising strategies” target 
(grade band 7) and the “citing to support inferences” target (grade band 11). 


For the MPSR items in ELA, the percentage receiving the maximum score was higher than both the 
CR and TE formats for the following “select text” items: 


e “select text” for justifying interpretations, claim 1, DOK 2 in grade band 6 
e “select text” for citing to Support inferences, claim 1, DOK 2 in grade band 11 
e “select text” for analyzing the figurative, claim 1, DOK 2 in grade band 11 


For mathematics, the results were more varied. The TE item types that showed a higher percentage 
of students demonstrating consistent thought process with the DOK level included: 


e “placing points” for fractions, claim 1, DOK 2 in grade band 3 

e “single lines” for equations and inequalities, claim 1, DOK 2 in grade band 11 

e “tiling” for fractions, claim 1, DOK 2 in grade band 3 

e “tiling” for equations and inequalities, claim 1, DOK 2 in grade band 11 (“Student indicated 
use of multiple steps and solved correctly.”) 

e “vertex-base quadrilaterals” for lines, angles, and shapes, claim 4, DOK 3 in grade band 4 


Places where equal percentages were observed for the TE and CR formats included: 


e “select and order” for fractions, claim 1, DOK 2 in grade band 3 

e “select and order” for fractions, claim 1, DOK 2 in grade band 6 

e “selecting points” for fractions, claim 1, DOK 2 in grade band 3 

e “single lines” for everyday math problems, claim 2, DOK 2 in grade band 11 


ltem types where the CR items had a higher percentage of consistent thought processes included: 


e “select and order” for apply arithmetic to algebraic expressions, claim 1, DOK 2 in grade 
band 6 
e “tiling” for everyday mathematic problems, claim 4, DOK 3 in grade band 4 


24 


e “tiling” for apply arithmetic to algebraic expressions, claim 1, DOK 2 in grade band 6 
e “tiling” for everyday mathematic problems, claim 2, DOK 3 in grade band 11 
e “vertex-base quadrilaterals” for lines, angles, and shapes, claim 1, DOK 2 in grade band 4 


The TE item types where a higher percentage of students received full credit included only: 


e “tiling” for equations and inequalities, claim 1, DOK 2 in grade band 11 
e “vertex-base quadrilaterals” for lines, angles, and shapes, claim 1, DOK 2 in grade band 4 


For the MPSR items in mathematics, the percentage receiving the maximum score was higher than 
both the CR and TE formats for the following items: 


e “placing points” for fractions, claim 1, DOK 2 in grade band 3 

e “select and order” for fractions, claim 1, DOK 2 in grade band 3 

e “selecting points” for fractions, claim 1, DOK 2 in grade band 3 

e “tiling” for fractions, claim 1, DOK 2 in grade band 3 

e “tiling” for everyday math problems, claim 4, DOK 3 in grade band 4 


e “vertex-base quadrilaterals” for fraction equivalence and ordering, claim 1, DOK 2 in grade 
band 3 


In other cases, the percentage receiving full credit for the MPSR items was lower than for the 
comparable CR items. It should be noted that the percentage receiving full credit was generally low 
in mathematics for all three item formats. 


Research Question 3: For multi-part selected response (WPSR) items where students may select 
more than one answer choice, which wording best indicates to the student that he or she is allowed 
to select more than one option? For multipart (e.g., YES/NO) dichotomous choice items, do students 
know that they need to answer each part? 


Smarter Balanced sought to investigate whether students might become confused by MPSR items in 
mathematics and perhaps not complete the entire item. In order to investigate this, items were 
constructed with different amounts of labeling. Labeling is the identification of the parts of the 
problem with indicators such as “a,” “b,” “c” or “1,” “2,” “3.” A “labeled” and a non-labeled” 
condition were investigated. An example of items in the labeled and unlabeled format is presented 
below (Exhibit 1). 


This question is designed to assess whether labeling or not labeling an MPSR mathematics item 
produces a difference in performance. Results are reported in five grade bands. The five grade 
bands are designated as grade band 3 (which includes form difficulty levels 3 and 4), grade band 4 
(which includes form difficulty levels 4 and 5), grade band 6 (which includes form difficulty levels 6 
and /), grade band 7 (which includes form difficulty levels 7 and 8), and grade band 11 (which 
includes form difficulty level 11). Each form contains one MPSR item followed by one CR item. The 
labeled and non-labeled items appeared in different forms of the test and thus were taken by 
different students. 


25 


Exhibit 1. Example of a Labeled Item 


Marcus has 36 marbles. He is putting an equal number of marbles into 4 bags. 


Indicate whether each equation could be used to find the number of marbles Marcus puts in each 
bag. 


i. 36x 4=| | O Yes ONo 
2. 36+4=| | O Yes ONo 
3. 4x| |= 36 O Yes ONO 


4. 4+| |= 36 O Yes ONo 


Example of an Unlabeled Item 


Marcus has 36 marbles. He is putting an equal number of marbles into 4 bags. 


Indicate whether each equation could be used to find the number of marbles Marcus puts in each 
bag. 


36x 4=| | O Yes ONo 
36+4=| | O Yes ONo 
4x| |= 36 O Yes ONO 


4+| |= 36 O Yes ONO 


Smarter Balanced Cognitive Laboratories Technical Report 


Results 


Ninety-six students were administered the grade band 3 forms, 66 students were administered the 
grade band 4 forms, 133 students were administered the grade band 6 forms, 33 students were 
administered the grade band 7 forms, and 85 students were administered the grade band 11 forms. 


Table 9 shows the percentage of students receiving full credit on the items by grade band and 
labeling condition. For grade bands 3, 4, 6, and 11 little difference between the labeled and non- 
labeled conditions is observed. However, in grade band / a higher percentage of students received 
full credit in the non-labeled format. 


Table 9. Percentage of Students Receiving Full Credit, by Grade Band and Labeling Condition. 


(CT gs\e(sm t-lale| 


Non- 
Labeled 32 32 20 62 16 


Labeled 29 31 18 34 


Table 10 shows whether the students understood the instructions under the different item labeling 
conditions. Up through grade band 6 the type of instructions received seemed to have little impact 
on the understanding of the instructions. However, in grade bands 7 and 11 a higher percentage of 
students tended not to understand the instructions when the items were labeled. The interviewers 
commented that “Student did not have a complete understanding of instructions” and “He said he 
understood, however, he only selected one bubble.” 


Table 10. Percentage Understanding the Instructions, by Grade Band and Labeling Condition 


(Cigs\e(sm er-lale| 


conn | 34 | 6 | 7 | a 


Non- 
Labeled 
aes Le | 99 | 99 | 60 | ot 


Table 11 shows the percentage of students who made comments about not understanding the 
instructions. Grade bands 3 and 11 had more comments about not understanding the instructions 
than the other grade bands, but the pattern was similar for labeled and non-labeled items. However, 
in grade band 7, non-labeled items generally received no comment, with labeled items receiving 


28 


Smarter Balanced Cognitive Laboratories Technical Report 


more comments. This is consistent with a lower percentage of grade band 7 students 
understanding the instructions in the “labeled” condition. 


Table 11. Did the Student Make Comments About not Understanding the Instructions (Percentage 
Making Comments)? 


CT es\e(sm tel ale| 


Summary 


Even though the labeling of MPSR items was intended to clarify the mathematic tasks for the 
students, in many cases it actually seemed to confuse the students. Little difference was observed 
between the labeled and non-labeled items in the lower grade bands (grade bands 3-6). However, 
students in grade band 7 tended to score higher with non-labeled items. Also, grade band 7 and 11 
students tended to be confused by the labeling. In addition, the labeled items tended to receive 
more comments related to not understanding the instructions. The interviewer confirmed this, 
Suggesting that the grade band 7 and 11 students better understood the instructions in the non- 
labeled condition than in the labeled condition. 


Research Question 4: Does the ability to move one or more sentences to different positions provide 
evidence of students’ ability to revise text appropriately in the consideration of chronology, 
coherence, transitions, or the author’s craft? 


Smarter Balanced is considering using items that have students reorder sentences to measure an 
editing/revising standard. Claim 2 of the standards states that students should be able to revise one 
or more paragraphs demonstrating specific narrative strategies (use of dialogue, sensory or concrete 
details, description), chronology, appropriate transitional strategies for coherence, or authors’ craft 
appropriate to purpose (closure, detailing characters, plot, setting, or an event). 


This question was designed to assess whether students’ movement of one or more sentences to 
different positions provides evidence of students’ ability to demonstrate consideration of chronology, 
coherence, transitions, or author’s craft. Six ELA items were included in a test form. The forms were 
administered to five students: two in grade 5, two in grade 6, and one in grade 10. Because there is 
little difference in the pattern of responses and because the sample sizes are small, the results will 
be reported for the sample as a whole. 


29 


Smarter Balanced Cognitive Laboratories Technical Report 


Results 


lt was hypothesized that students who do well on these items would recognize the need to revise 
for chronology, coherence, transitions, or author’s craft. Table 12 shows the percentage of students 
who recognize the need to revise for chronology, coherence, transitions, or author’s craft for 
students who performed well on the items and those who performed poorly. The results show that 
students who performed well are more likely to consider chronology, coherence, transitions, or 
author’s craft in their revisions than students who do not. Among the four writing skills examined, 
author’s craft was considered less often than the other three writing skills. 


Table 12. Percentage of Students Considering Targeted Writing Skills When Revising, by Those 
Students Who Performed Well and Those Who Performed Poorly 


Students Who Students Who 
Perform Well Perform Poorly 


authors Gra [50% [i 


Also of interest was whether students referenced organization, coherence, transitions, or author’s 
craft when moving sentences. Table 13 shows the percentage of students who considered each of 
the targeted writing skills relative to the number of appropriate and inappropriate sentence moves. 
The results suggest that students who make more appropriate sentence moves (and fewer 
inappropriate sentence moves) are more likely to consider the writing skills of chronology, coherence, 
and transitions; however, the pattern is less clear for consideration of author's craft. 


Or ats] cs leks) acsjule; 


30 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 13. Percentage of Students Who Considered Chronology, Coherence, Transitions, and 
Author’s Craft at Each Number of Appropriate and Inappropriate Sentence Moves 


% Students 
Who 
Recognized 
I (exoto I me) g 


pf Of a} 2} 3] 4{5} 6} 7/o] 4 | 2/3)4) 5/67 


IV 0) ©) 0) e) ata 1KcMel=lalecyalecrom /(e\ cre) IVa Lat) ©) e)ce) e)ar- is Mey=valts)arero cme) cre | 


Chronology | 38 | 50 | 50 | 67|75| | 100 | 100 | 100 | 100 | 40/50|33| o| | oO 
-Coverenes {38/481 67 | 671 78} _| 400 i 100 | 100 | 40/50] 33| o| | 0 


Transitions | 38 | 50|50|67|75/ | 100/100] 100|100| 40/50/33) o| | o| 


Anthor’s 
craft 13 | 13 33 | 25 6/7 29 17 | 20/50] 33 


Table 14 shows the percentage of students who considered chronology, coherence, transitions, and 
author’s craft when answering the items as observed by the interviewers. Students did express 
consideration of chronology (“I moved the first sentence because it goes at the top,” “This seems to 
be in order,” “This should be the second to last sentence”); coherence (“This seems like something 
you'd say,” “I don’t need to take out more phrases, it sounds OK,” “I removed the two sentences 
because they did not make sense and were irrelevant”); and transitions (“This would sound better 
here”) when answering the questions; however, fewer took author’s craft (“I think there is a flow to 
the story,” “Some sentences are awkward and need to be moved”) into account when answering 
these questions. 


Table 14. Percentage of Students Who Considered Chronology, Coherence, Transitions, and Author’s 
Craft When Answering, Across Items 


Writing 


Skills 0 aT ge) ae) (0y2417 (Ove) a=) a=) A105) Wicslatsyiaceyars) Author’s Craft 


Powenige| 68 | TB 


Summary 


Students who performed well on the items were more likely to consider the targeted writing skills 
(chronology, coherence, transitions, and author’s craft) when answering the questions. Also, 
students who made appropriate sentence moves were more likely to consider the targeted writing 
skills than those who made inappropriate sentence moves. A high percentage of students 
considered chronology, coherence, and transitions; however, they were less likely to consider 
author’s craft. 


31 


Smarter Balanced Cognitive Laboratories Technical Report 


Research Question 5: Do students who construct text reveal more understanding of targeted 
writing skills than students who manipulate writing through the manipulation of text (MT) tasks? 


Many believe that the best way to measure writing is to have students write. However, in a testing 
environment, it is often difficult to adequately sample the writing content domain with an 
assessment composed exclusively of CR items. There is an ongoing effort to find items that are 
efficient, but that can adequately measure the components of the writing domain, thus allowing for a 
broader selection and greater number of items to be delivered. Examples of the types of questions 
used can be seen in Exhibit 2. This question examined whether students responding to MT tasks 
would demonstrate understanding of the targeted writing skills comparable to the understanding 
demonstrated for matched CR tasks. 


Four pairs of ELA items were developed. Each pair contained one MT item and one CR version of the 
same item. Two forms were created, with each form containing a single version of an item. Each 
form contained two MT items and two CR items. The MT items were almost exclusively “select and 
order” items, though two of the items - one in grade band 3 and one in grade band 11 - were 
“reorder text” items. All items assessed claim 1, target 1. 


Forms were constructed in ELA at three grade bands: grades 3-5 (referred to as grade band 3), 
grades 6 and / (referred to as grade band 6), and grades 10 and 11 (referred to as grade band 11). 
In grade band 3, two forms were administered; in grade bands 6 and 11, only a single form was 
administered. All forms assessed claim 1, target 1. 


The sample consisted of seven students in grade band 3, two students in grade band 6, and one 
student in grade band 11. 


a2 


Smarter Balanced Cognitive Laboratories Technical Report 


Exhibit 2. Sample Items Used in this Research Question 
Stem 


A student wrote the first draft of a story about a girl who eats nine berries for an afternoon snack 
every day. Read the story. Then complete the task that follows. 


Every day after school, Kim eats nine red, juicy raspberries. One day, Kim sits down at the big 
kitchen table and has a surprise. She notices that one of her berries is missing! “[ ],” she says. 

“| counted nine just a minute ago,” Dad says. 

“[ 1,” Kim says. “T J.” 

Kim begins her search in the garage. “[ |?” Kim asks. 


Dialogue 
Oh no! There are only eight raspberries in my bowl 


| wonder what happened to the ninth berry 
Grandma, why are your mouth and lips red 


It looks like | have a mystery to solve 


Revise the story to include dialogue that introduces the plot. Place each piece of dialogue in the 
correct place in the story. 


The dialogue will go in the brackets. 


33 


Smarter Balanced Cognitive Laboratories Technical Report 


CR Prompt 


A student wrote the first draft of a story about a girl who eats nine berries for an afternoon 
Snack every day. Read the story. Then complete the task that follows. 


Every day after school, Kim eats nine red, juicy raspberries. One day, 
Kim sits down at the big kitchen table and has a surprise. She notices 
that one of her berries is missing! 

Her dad had counted nine just a few minutes ago. 


Kim knew she had a mystery to solve. 
Kim began her search in the garage. She found her grandmother in 
the garage with bright red lips. 


Revise the story to include dialogue. Use dialogue to introduce the plot. 
Type your response in the space provided. 


34 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 15. Targeted Writing Skills with Examples of Representative Statements 
Target Types of Statements 


Chronology - | knew it was telling a story, so looked for the 
beginning then moved the rest around to make 
sense. 
| Knew what the end was, so worked backwards 
from there. 
| knew the youngest son went last, so put him 
at the end, then put the two older ones before 
him. Then picked the beginning and put it first. 
Some spots didn’t sound quite right, so added 
the sentences in. 

Read the sentences, then looked for related 
sentences in the passage that they’d go with. 

| used transitions to cue position of sentences. 
| need to revise the order of the sentences so 
that they more clearly support the main idea of 
the article. | do not need to move the first or 
last sentence. 

Coherence Sentence is like a preview of the rest of the 
essay, So it should go first. 

This sentence sounds professional and it also 
connects to the facts that follow. This is the 
best thesis statement. 

This sentence wraps up the author’s 
argument/ point of view and finishes the essay 
by restating the main point. 

The conclusion often just rephrases the thesis, 
which this sentence does, but it also talks 
about other things from the passage, so it 
Should be the conclusion. 

| have to choose the two sentences that 
Shouldn't be part of the paragraph. 

| have to take the sentence at the top and drag 
it to best spot in the paragraph below. 

Transitions The word “next” tells him it comes after 
something else. 

The word “first” is a clue that it goes at the 
beginning. 

“Finally” usually tells you you’re at the end. 
A transition like “therefore” at the start of a 
sentence connects it to the sentence before. 
They have the same topic but this one comes 
second. 

| have to use transitions words to make the 
paragraph clearer. 

| looked at the transition words to see what 
should come before them, then put in a 
sentence if needed. 


35 


Smarter Balanced Cognitive Laboratories Technical Report 


Author's Craft | found the parts that didn’t give me a really 
clear picture in her mind and changed them. 
| looked for the parts that weren’t as 
descriptive as the rest and made them more 
descriptive. 
| looked for the parts that sounded a little 


boring and made them more exciting. 

| read the topic sentences and looked for the 
sentence that didn't go with it. 

lf a sentence makes the argument weaker, 
then it should be taken out, so these two need 
to be removed. 


Results 


lt was hypothesized that student think-alouds on MT items would reference the appropriate writing 
skills reflected in the assessment target at a level comparable with CR items. Table 16 shows the 
percentage of students who referenced the targeted writing skills, by item format and grade band. In 
grade band 3, chronology was more likely to be considered during revision when the item format was 
MT (“First, next, last order of events”) than when the item format was CR (“Historically probably 
comes first, having trouble ending story”). Similar patterns, but less pronounced, were seen with 
coherence, transitions (“This is a cause...as a result (an effect) should be here”), and author’s craft. 
Grade band 3 students only considered author’s craft during revision for about one-third of the items 
regardless of item format. Grade band 6 students always considered chronology and coherence 
during revision, but transitions and author’s craft were only considered about half the time. In grade 
band 11 chronology, coherence, and transitions were always considered in both formats. Author’s 
craft was only considered about half the time in the CR format and not mentioned at all in the MT 
format. One interviewer commented, “Student made no comment about author’s craft.” 


Table 16. Percentage of Items in Which Students Considered Target Characteristics When 
Responding to the Item, by Item Format 


CT rsle(sm etlale| 


100 100 
100 100 


100 100 
100 100 
100 
100 


Smarter Balanced Cognitive Laboratories Technical Report 


ee ee ee ee ee 


Table 17 shows the counts for item scores received for the two item formats, by grade band. 
Comparable scores were achieved for the two item formats. 


Table 17. What Score (Across Items) Would the Student Receive on this Type of Item? 
(Cir-le(-Mat-] ale 
3 14 


eS SS eae re ae 
| | ~temFormat_| O | 1] 2) 0/1) 2)o/4] 2 


Table 18 provides information about whether the students who construct text through writing reveal 
comparable or greater understanding of targeted writing skills than students who manipulate text. 
The grade band 3 and grade band 6 students were either more effective in applying the targeted 
writing skills when the items were in a MT format or no differences were observed in effectiveness 
between item formats. For the grade band 11 students the results were mixed, but students tended 
to be more effective in applying the targeted writing skills in the CR format, particularly for transitions 
and author’s craft. 


Table 18. Effectiveness of Applying Targeted Writing Skills by Item Format (Percentage of Students 
as Assessed by Interviewer) 


ee 


Characteristics 


CT es\e(sm etelale| 


— 


MT More 
Effective 
No 
Difference 
CR More 
Effective 
MT More 
Effective 
Difference 
Effective 
Effective 
No 
Difference 
Effective 


O 
I 
O 


W} NM] W] ® 

00} O11; OO} CO 

OO} N| O| © 
‘o) 
-_ 
© 
‘o) 


Ol} O1 
ag 
Ol} Ol 
ag 


Summary 


The results showed that the targeted writing skills are considered by students who manipulate text at 
a level comparable to (or greater than) that encountered when they are constructing text. The grade 
band 3 and 6 students showed comparable (or greater) levels of understanding when the items were 
in an MT format. For the grade band 11 students the results were mixed, but students tended to be 


a7 


Smarter Balanced Cognitive Laboratories Technical Report 


more effective in applying the targeted writing skills in the CR format, particularly for transitions and 
author’s craft. Score distributions were comparable for MT and CR item formats. 


Research Question 6: Do different types of directions (minimal, concise, or extensive) have an effect 
on the performance of technology enhanced (TE) items in ELA and Mathematics? 


The optimal amount of direction that should be given to students for some item types is unclear. 
With minimal directions students may not know how to approach the item; with extensive directions 
students may be distracted or slowed to a point where the item becomes inefficient. This may be 
particularly true with elementary school students, who may take longer to process text. This question 
examined these issues for ELA and mathematics items. Three types of directions (minimal, concise, 
and extensive) were examined for different item types. 


Forms were constructed in ELA at five grade bands: grade 3 (referred to as grade band 3), grades 4 
and 5 (referred to as grade band 4), grades 6 and 7 (referred to as grade band 6), grades 7 and 8 
(referred to as grade band 7), and grade 11 (referred to as grade band 11) with a single form 
administered in each grade band. In mathematics, forms were constructed at four grade bands: 
grades 3 and 4 (referred to as grade band 3), grades 4 and 5 (referred to as grade band 4), grades 6 
and / (referred to as grade band 6), and grade 11 (referred to as grade band 11). 


Parallel items were created with minimal, concise, or extensive directions in ELA and for most item 
types in mathematics. However, not all direction types appeared with all item types in all grades in 
mathematics. Four items in one of the three formats (SR, TE, and CR) appeared in each mathematics 
form. Two items in one of the three formats appeared in each ELA form. Multiple forms were 
administered, each one to a different sample of students. An example of the different direction types 
for an ELA item and a mathematics item is presented in Exhibit 3. 


38 


Smarter Balanced Cognitive Laboratories Technical Report 


Exhibit 3. Example of the Types of Instructions Under the Minimal, Concise, and Extensive 
Instruction Condition for the Item That Follows 


ELA Example 


Minimal Directions 
Drag the best transition word to each blank in the paragraph. 


Concise Directions 
Complete the paragraph by selecting the best transition word that fits in each blank. Drag each 
transition word you selected to the correct blank in the paragraph. 


Extensive Directions 

There are six transition words in the text box. Complete the paragraph correctly by choosing a 
transition word that best fits each blank. Drag the transition word you selected from the text box to 
the correct blank in the paragraph. 


re 


It was winter. The cold wind was blowing and 
Snow was covering the ground. Sarah gazed out 
the window and saw a bird trying to find food. 
She wanted to help the bird. After thinking for a 
while, Sarah decided to make a pinecone bird 
feeder. First, she tied a string to the top of a 
pinecone. , she covered the pinecone with 
peanut butter. After this, she placed the pinecone 
in the freezer. Later, she rolled the pinecone in 
birdseed. , she placed the pinecone bird 
feeder on a tree for the birds. 


39 


Smarter Balanced Cognitive Laboratories Technical Report 


Mathematics Example 


Minimal Directions 
Drag numbers to make the equations true. 
Concise Directions 


Move numbers to make the equations true. 


Drag the numbers to the answer space. 
Extensive Directions 


Drag numbers to make the equations true. 


Each number can be used only once. To use a number, drag it to the appropriate box in an equation. 


| - 


ee 
VL] - LI 


40 


Smarter Balanced Cognitive Laboratories Technical Report 


Results 
Table 19 provides a count of the students in a grade band, by content area and direction type. 


Table 19. Sample Sizes by Content Area, Direction Type, and Grade Band 


Content DY c=Xeid(o) am AY el=) Grade 3 Grade 4 Ci r-te(=Me) Grade 7 Grade 11 


[Btensve | 8 | a7 | a | 7 | 6 
SC 
30 fad Sid 
ef 4 SCC 


Table 20a shows the percentage of students receiving full credit for the ELA items by direction type, 
item type, and grade band. In grade band 3, “select text” items were more challenging than “reorder 
text” items. This was especially true when the directions were “concise.” With the “reorder text” 
items the grade band 3 students did less well with minimal directions. The grade band 11 students 
also had some difficulty with the “reorder text” items when the directions were “extensive.” For the 
other grade bands, neither the level of instruction nor the item type showed a differential effect. 


Table 20a. Percentage of Students Who Received Full Credit on ELA Items by Direction Type and 
Grade Band 


CT es\e(sm telale| 


FLA 
Direction Type Item Type 


4] 


Smarter Balanced Cognitive Laboratories Technical Report 


In mathematics, a low percentage of students received full credit for “placing points” under the 
minimal and concise directions in grade band 11 (Table 20b). However, under extensive directions 
all students received full credit. With “placing points and tiling” items a higher percentage of 
students received full credit as the amount of instructions were reduced (grade band 6). “Select and 
order” items were difficult (grade bands 6 and 11) regardless of the direction type; however, no 
direction type proved better than another. The “select defined partition” items and the “straight lines’ 
items showed high percentages of students receiving the maximum score, but the direction type did 
not make a difference. “Vertex-based quadrilateral” items seemed to benefit from minimal directions 
in grade band 11. Finally, “tiling” items were generally difficult, but no benefit was shown for 

different types of directions. The incompleteness of the data limits other comparisons. 


7 


42 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 20b. Percentage of Students Who Received Full Credit on Different Types of Mathematics 
ltems, by Direction Type and Grade Band 


Direction Template Grade Band 
3 4 6 11 

Minimal | PlacingPoints | | | | 

ee 


| Extensive — | Placing Points | Placing Points 


Minimal Placing Points and 
Tiling 
Placing Points and 
Concise 
Tiling 
Extensive Placing Points and 
Tiling 


Minimal Select Defined 
Partitions 

eongike Select Defined 
Partitions 

Extensive select Detined 71 83 
Partitions 


Straight Lines 


Extensive Straight Line and 
Tiling 


Concise ae 


a 
ta 
Quadrilaterals 
Quadrilaterals 


Evencie Vertex-Based 
Quadrilaterals 


Smarter Balanced Cognitive Laboratories Technical Report 


Understanding instructions 


In ELA (Table 21a), for most item type/direction type/grade band combinations few students had 
difficulty understanding instructions. Cases in which difficulties were mentioned included about 50 
percent of the students in grade band 4 with both minimal and extensive instructions for the “select 
and order” items. This was also true in grade band 3 for the “reorder text” items with extensive 
instructions and for the “select test” items with concise and extensive instructions. Finally, in grade 
band 11 the “reorder text” items with minimal and concise instructions elicited more comments. 


In mathematics (Table 21b), the cases in which more comments were made about the instructions 
included “placing points” with minimal and concise instructions (grade band 11), “single ray” items 
with extensive instructions (grade band 6), “straight lines” items with extensive instructions, and 
“vertex-based quadrilateral” items with extensive instructions (grade band 3). The single ray item 
with extensive instructions in grade band 6 stood out as an item in which instructions were not well 
understood. (“Weren’t totally sure how instructions were to be completed.”) The percentage of 
students getting the maximum score on this item type was also low. 


Table 21a . Percentage of Students Who Express the Difficulties in Understanding Each Type of 
Instruction for Each TE Type in Their Think-Alouds (ELA) 


(CT es\e(sm t-lale| 
6 


Select and 
Order 
Select and 
Order 
Select and 
Order 


Select Text 


a 
pf | tT fe 
at | tT 
pee Po 
a 
pf fs fof ot 


44 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 21b. Percentage of Students Who Express the Difficulties in Understanding Each Type of 
Instructions for Each TE Type in Their Think-Alouds (Mathematics) 


Math Grade Band 


TE type 


Placing Points 
Placing Points 
Placing Points 


Placing Points and 

Placing Points and 
Concise Tiling 

Placing Points and 
Extensive Tiling 


Select and Order 
Select and Order 
Select and Order 


Select Defined 
Minimal Partitions 


Direction 
Type 


Select Defined 
Select Defined 


iol Straight Lines and 
Extensive Tiling 


Vertex-Based 
Quadrilaterals 
Vertex-Based 
Quadrilaterals 
Vertex-Based 
Quadrilaterals 


& OO — NO fb 
wo |oO on or |S 
IK on IS 
ND NO 
oO 00 <_r ~ |@ 
A N Nd} Go St) 
= 8) Ww |My a 
ND St) w® |o 
Ww NO i = = Oo 
KE IS NO wo i 
IN oO oO St) 


Smarter Balanced Cognitive Laboratories Technical Report 


Difficulty Using the Computer 


The results for ELA related to difficulty using the computer were mixed (Table 22). In grade band 3 
under minimal directions for both “select text” and “reorder text” items, the students seemed to 
have difficulty using the computer. The grade band 11 students seemed to have some difficulty with 
the “reorder text” items. 


Table 22. Percentage of Students Who Said They Had Trouble Using the Computer (ELA) 


Item 
Characteristic 


-Minimat | __—SelectTet a3 
concise | Selecttot——[95 a 
[Extensive [—‘SelectTet—=S=~sS=CiagsS)S Sid | 


Direction Type 


"Minima | Selestand order «| SSS 
“concise | —Selestand Order «(| =Ssid~=CisS| SSCS 
[Btensive | __‘Selestandorder «| S| ae |] 
-—Winimal[_—‘Reordertet ——=«d|~saa)SSSSCSCSSC‘“S=C‘( 
“concise | —~Reordertet ——=«d;=saa =) =SSCiSiSCSC‘“tSCi 
[Btensive | —‘Reordertet——=«t~=sea]SSSiSSCiSSSC*“tCi 


Most students in mathematics had little trouble using the computer with mathematics items. 
Summary 


In most cases in ELA the level of instruction did not have an influence. For most grade bands and 
item types, neither the level of instruction nor the item type had a differential effect in ELA. Cases in 
which differences were observed included “select text” ttems when the directions were “concise” 
(grade band 3). With the reorder text items the grade band 3 students did less well with minimal 
directions. The grade band 11 students also have some difficulty with the “reorder text” items when 
the directions were “extensive.” 


In mathematics, the level of instruction also did not make a difference for many of the item types 
and grade bands. “Select and order” items were difficult (grade bands 6 and 11) regardless of the 
direction type; however, no direction type proved better than another. High percentages of students 
received full credit on the “select defined partition” items and the “straight lines” items; however, the 
direction type did not make a difference. Finally, “tiling” items were generally difficult, but no benefit 
was shown for different types of directions. Places where differences were observed included 
“placing points” under the minimal and concise directions in grade band 11; however, under 
extensive directions all students received the maximum score. In working with “placing points and 
tiling” items, a higher percentage of students received full credit with fewer instructions (grade 

band 6). Finally, “vertex-based quadrilateral” items seemed to benefit from minimal directions in 
grade band 11. 


46 


Smarter Balanced Cognitive Laboratories Technical Report 


The results for ELA related to trouble using the computer were mixed. In grade band 3 under 
minimal directions with both select text and reorder text items the students seemed to have 
difficulty using the computer. The grade band 11 students seemed to have some difficulty with the 
“reorder text” items. Mathematics students did not seem to have any problems using the computer. 


ELA Questions, Passage Processing 


Research Question 7: Smarter currently intends to administer the passage first, and then administer 
the items one item at a time. Does this affect student performance? 


Smarter Balanced is interested in the possibility of administering items adaptively within a passage. 
This would require administering items sequentially so that the ability estimate could be updated 
after each item. Presenting items one at a time may take longer, and students may object to not 
knowing what is coming next. This question is designed to assess whether administering an item set 
takes longer when the items are presented sequentially and whether there is a difference in 
confusion or frustration level when students are presented a passage and all the items together or 
are presented a passage with the items then being presented one at a time. The item sets were not 
administered adaptively. 


Two sets of items were created for a given test form. Both sets contained passages of equivalent 
length and difficulty as well as items of equivalent difficulty.2 The first set in a form presented the 
passage with all the items together. The second set presented the passage with the items presented 
one ata time. 


The forms were administered, within grade band, to different samples of students. Each sample 
contained both a general education group (Gen Ed) and a group that received ELL students. One 
sample was timed without thinking aloud during the administration. Each item set in these forms 
was separately timed. This sample provided timing information only. The second sample involved 
thinking aloud while responding to the questions and was not timed. Forms were constructed in ELA 
at three grade bands: grades 3-5 (referred to as grade band 3), grades 6-8 (referred to as grade 
band 6), and grades 10 and 11 (referred to as grade band 11). 


The primary questions of interest were: 


1. Does presenting the items individually after the passage appear to take longer (timed condition)? 
2. Does presenting the items individually after the passage increase the student’s negative 
emotional states (e.g., frustration, confusion; think-aloud condition)? 

3. Do students prefer one approach or another (think-aloud condition)? 


: Comparable passage difficulty was achieved through the use of readability and lexile measures. Comparable item 
difficulty was achieved through depth of knowledge (DOK) measures. 


47 


Smarter Balanced Cognitive Laboratories Technical Report 


Results 


Table 23 shows the sample sizes taking each form of the tests, by grade band, for the ELL and Gen 
Ed samples. Sample sizes are smaller for the ELL sample in grade band 11. 


Table 23. Student Counts by Grade Band, Testing Population, and Testing Condition 


(CT es\e (sm ete) ale! 


a 
on 
a 


Table 24 shows the time (in seconds) it took to complete the item sets when all items were 
presented together or items were presented one ata time, by grade band and sample. For the grade 
band 3 and grade band 11 samples, timing differed little whether the items were presented in one 
block or one ata time. However, for grade band 6, presenting the items one at a time took 
substantially longer. While there is some variability between the ELL and the Gen Ed samples, the 
differences are not large and show a similar pattern. Note that the grade band 11 ELL sample was a 
single student and is not presented to avoid misleading results. 


Table 24. Average Time to Complete the Passage and Items, by Administration Format, Grade Band, 
and Sample 


Passage + Dyin) c=aler= 
Passage + All Oneltemata  (All—One at 
Items | a Time) 


Tables 25 and 26 show whether the ELL or Gen Ed sample students expressed confusion (Table 25) 
or frustration (Table 26) with the passages or items. There appears to be slightly more confusion for 
both the Gen Ed and the ELL sample students in grade band 3 when all the items are presented 
together. However, similar frustration levels were observed under the two formats for the grade band 
3 students. The grade band 6 ELL sample, showed similar patterns of frustration and confusion for 
the two presentation formats. However, the Gen Ed grade band 6 students showed slightly more 
confusion when the items were presented one ata time. The grade band 11 Gen Ed students 
showed similar levels of confusion and frustration under both administrative formats. The grade 
band 11 ELL sample included only two students and is not reported. 


48 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 25. Percentage of Students Expressing Confusion with the Different Components of the Test 
by Administration Format, Grade Band, and Sample 


All Items One ata Time 


CT es\e(sm tel ale| (CT es\e(sm telale| 


Test 
sample | component | 2 | 8 | # | 3 | 6 | 


Genes | Possage | 88 | 2 | wv | 0 | | 
tems [sf | | 8 | 
[tens fe fs |] | 


Table 26. Percentage of the Students Expressing Frustration with the Different Components of the 
Test, by Administration Format, Grade Band, and Sample 


All Items One ata Time 


CT es\e(sm tel ale| (CT es\e(sm erelale| 


Test 
| same | component | 2 | & | % | 3 | 6 | a 


Genes | Passage [| 0 | 29 | a7 | 0 | 2 | 
tems fs fs fa foe fo 
tens 3s fae fs ff 


Table 27 presents the average score students obtained for the think-aloud protocols. The grade 
band 6 students tended to score higher when the items were presented all at one time (for both the 
Gen Ed students and the ELL students). The grade band 3 students scored higher when the items 
were presented one ata time, regardless of sample or testing condition. The grade band 11, Gen Ed 
students scored higher when the items were presented one at a time, while the grade band 11, ELL 
sample students scored higher when the items were presented all at one time, though the latter 
sample size is small. 


49 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 27. Average Score, by Administration Format, Grade Band, and Sample 


All Items at Once One Item ata Time 


See Grade Band Grade Band 


Gen Fa 25 


ee 


Table 28 shows the preference for a presentation format. Both the ELL and Gen Ed grade band 3 
students preferred to have the items presented one at a time. (“I preferred one at a time—less 
confusing than seeing too many questions,” “One at a time made me less nervous about how many 
more there were,” “I liked one at a time because it did not seem overwhelming.”) Grade band 11 
students (Gen Ed and ELL) had a slight bias toward having the items presented one at a time (“Let’s 
me focus on that one question”). Conversely, grade band 6 Gen Ed students preferred to have the 
items presented together (“I liked them altogether,” “This way | know | was on the same passage,” 
“All together, you can refer to the questions while you read the passage,” “I liked everything on one 
page because it was more easy,” “With all together, | was able to refer back and | could see where | 
was going,” “I liked altogether, though it was more confusing and distracting.”) The grade band 6 ELL 
students were equally divided between the two formats. 


50 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 28. We Presented the Questions to You in Two Different Ways. Which Way Did You Prefer: All Together or One at a Time (Percent 
Responding)? 


(CT es\e(sm et-lale| 


One at One at One at 
All Together All together 
Sample ealouses a Time ieichcaieas a Time Lad medoaies a Time 


ES a rr ee ee ee 
pet pt sg a 88 


51 


Smarter Balanced Cognitive Laboratories Technical Report 


Summary 


We were interested in assessing whether there is a difference in timing and increased negative 
emotional states (confusion, frustration) when students are presented a passage with all the items 
or are presented a passage with the items presented one at a time. Forms were administered to two 
groups of students: a group that received English language accommodations and a Gen Ed group. 


The time it took to complete the sets when all items were presented together or one at a time varied 
by grade band and sample. For the grade band 3 and grade band 11 samples, timing differed little 
whether the items are presented in one block or one at a time. However, for grade band 6, 
presenting the items one at a time took substantially longer for both the Gen Ed and ELL samples. 
While there is some variability between the ELL and the Gen Ed samples, the differences are not 
large and show the same pattern within grade band. 


There appeared to be slightly more confusion for both the Gen Ed and the ELL samples in grade 
band 3 when all the items were presented together. However, similar frustration levels were 
observed under the two formats for the grade band 3 students. The grade band 6 ELL sample 
students showed similar patterns of frustration and confusion for the two presentation formats. 
However, the Gen Ed grade band 6 students showed slightly more confusion when the items were 
presented one ata time. 


The grade band 6 students tended to score higher when the items were presented all at one time 
(for both the Gen Ed students and the ELL students). The grade band 3 students showed similar 
results, regardless of sample or administration format. The grade band 11, Gen Ed students scored 
higher when the items were presented one at a time, while the grade band 11 ELL sample students 
scored higher when the items were presented altogether. 


Both the ELL and Gen Ed grade band 3 students preferred to have the items presented one at a time. 
Grade band 11 students had a slight bias toward having the items presented one at a time. 
Conversely, grade band 6 students preferred to have the items presented together. 


Research Question 8: Smarter intends to present relatively long passages. Do longer passages 
reduce student engagement? 


Smarter Balanced is interested in using passages that are longer than those presently used. The 
Smarter Balanced recommended passage lengths are: for grades 3-5: 450-562 words for short 
passages and 563-750 words for long passages; for grades 6-8: 650-712 words for short 
passages and 713-950 words for long passages; and for high school, 800-825 words for short 
passages and 826-1100 words for long passages. There is concern that the longer passages may 
tax the processing abilities of ELL and SWD students. 


This question is designed to assess whether longer passages reduce student engagement, hamper 
the completion of the longer passages, or affect the depth of processing of the passage. Two sets of 
items were created. Both sets contained passages of equivalent difficulty with four items of 
equivalent difficulty attached to each passage. Both sets present the passage and all the items 
together. Each form contained a standard-length and an extended-length passage. The first set 
contained a passage of standard length. The second set contained a passage that is longer than 


2 


Smarter Balanced Cognitive Laboratories Technical Report 


standard length (extended-length, the length equivalent to that intended for use by Smarter 
Balanced). 


Forms were constructed in ELA at three grade bands: grade band 3-5 (referred to as grade band 3), 
grade band 6-8 (referred to as grade band 6), and grade band 10 and 11 (referred to as grade band 
11). The design was intended to compare the performance of two groups of students—ELL/SWD and 
Gen Ed students—across three grade bands (3, 6, and 11). Thirteen students took the forms. Of 
these, nine were grade band 3 Gen Ed students. One grade band 3 student was classified ELL/SWD. 
The single grade band 6 student was an ELL/SWD student. The two grade band 11 students were 
Gen Ed students. 


Results 


Table 29 shows the percentage of students whose engagement was improved or unaffected by the 
longer passage, by subgroup. All the ELL/SWD students were unaffected by the use of the longer 
passage. Gen Ed students did appear to be affected by the longer passage in grade bands 3 and 11. 
All the ELL/SWD students were able to read the entire passage regardless of passage length. Only 
about 25 percent of the grade band 3 Gen Ed students and none of the grade band 11 Gen Ed 
students were unaffected by the use of the longer passage (see Table 29; “I have to read the whole 
passage?”). The ELL/SWD students all demonstrated that the longer passage was processed ata 
deep level (“It was a good story”). However, only 43 percent of the Grade band 3, Gen Ed, students 
demonstrated a level of deep processing (“I learned many new things”) and only 50 percent of the 
grade band 11 Gen Ed students demonstrated a level of deep processing (Table 31). The ELL/SWD 
students were not bored or distracted while reading either passage; however, some percentage of 
the Gen Ed students were bored regardless of the length of the passage. 


Table 29. Percentage of Students Whose Engagement Is Improved or not Affected by the Longer 
Passage 


‘Cle=le(=mnt= ale 
Subgroup) 3 | 6 | 14 
GE | 2h 


| GE 0 
-ELlyswD | 100 | 100 | 


53 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 30. Percentage of Students Who Appear to Read the Entire Passage 


Standard 
Length Grade Band 


‘Subgroup | 3 
ce | 88 


mausyalo(sye| 


Length Grade Band 


Table 31. Percentage of Students Whose Think-Aloud Demonstrate Deep Processing as Assessed by 
the Interviewer 


Slr ]aleral ce) 
Length (Cig-lo(-Mst-] ale 


“Subgroup | 3 | 6 | 11 
Pce | 43] «100 
reuyswo | 100 | 100 | 


Extended 
Length (Cig-le(-Msr-] ale 


Subgroup) 3 | 6 | i 
pce | 43 | | 50 
Peuiswo | 100 | 100 | 


54 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 32. Percentage of Students Who do not Appear Bored or Distracted 


Standard 
Length (CT r=lol(-m srl are! 


“Subgroup | 3 | 6 | 11 
- ce | 63 |_| 100 
reuyswo [100 | 100 | 
eee 


Extended 
Length Grade Band 


“Subgroup | 3 | 6 | 11 
r ce | es | | 50 
Peuyswo [100 | 400 | 


Summary 


Smarter Balanced is interested in using passages that are longer than those presently used. There is 
concern that the longer passages may tax the processing abilities of ELL and SWD) students. This 
question is designed to assess whether longer passages reduce student engagement, hamper the 
completion of the longer passages, or affect the depth of processing of the passage. The design was 
intended to compare the performance of two groups of students—ELL/SWD and Gen Ed students— 
across three grade bands (3, 6, and 11). Two sets of items were created. Both sets contained 
passages of equivalent difficulty with four items of equivalent difficulty attached to each passage. 
Both sets present the passage and all the items together. Both the standard-length and the 
extended-length passage were included in a given form and administered to the same student. 


All the ELL/SWD students were unaffected by the use of the longer passage. They were able to read 
the entire passage regardless of passage length and demonstrated that the longer passage was 
processed at a deep level. The ELL/SWD students also were not bored or distracted while reading 
either passage. 


On the contrary, Gen Ed students did appear to be affected by the longer passage in grade bands 3 
and 11. About 75 percent of the grade band 3 students and all of the grade band 11 students were 
affected by the use of the longer passage. Only 43 percent of the Grade band 3 Gen Ed students 
demonstrated a level of deep processing and only 50 percent of the grade band 11 Gen Ed students 
demonstrated a level of deep processing. Also, some percentage of the Gen Ed students were bored, 
regardless of the length of the passage 


55 


Smarter Balanced Cognitive Laboratories Technical Report 


Research Question 9: How long does it take for students to read through complex texts, 
performance tasks, etc.? Is timing affected by the way students are presented the passage and 
items? 


One way of making items more difficult is to increase their complexity. Complex items often take 
longer to solve or answer. In computer adaptive tests, added complexity may decrease the time a 
high ability student has to complete the test if the items are made more difficult through increased 
complexity. This potentially creates some fairness issues in an adaptive test if there is a time limit on 
the test. This question was designed to assess the time it takes for students to answer complex and 
simpler items. Complexity was defined as a function of the DOK demanded by the test question. It 
was hypothesized that more complex tasks would take more time. 


Each ELA form had six items. These items varied in item complexity (simple or complex) and item 
format (SR, TE, or CR). The TE items were all “hot text” items. These items require the student to 
either highlight the text or drag the text to answer the item. 


Forms were constructed in ELA at two grade bands: grade band 3-5 (referred to as grade band 3) 
and grade band 6 and / (referred to as grade band 6). Two forms were administered in grade band 3. 
One form was administered in grade band 6. 


Results 


Eight students took the grade band 3 forms with four students taking each form, and two students 
took the grade band 6 form. 


Table 33 presents the average time (in Seconds) a student took to answer an item. SR items were 
answered in the shortest time. HT items took about one minute longer than the SR items. CR items 
took the most time to answer, about 75 seconds longer than the “hot text” items. With the exception 
of the complex CR item administered to grade band 6 students, item complexity did not seem to 
have an impact on item performance. (An interviewer commented, “Student took about the same 
time for complex and easy items.”) 


56 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 33. Average Time (in seconds) to Answer an Item by Grade Band, Item Type, and Item 
Complexity 


(CT es\e(sm tel ale| 


me oe | 2 
ore = fe 
[ae 
ee [as 
noe ae [es 
Ceo se [8 


Table 34 presents a summary of the average time students took to complete complex and simple 
items across item types by grade band. Complex items seemed to have more impact in grade band 6, 
but there is no evidence that complex items, as defined here, take longer than simpler items. 


Table 34. Interviewer’s Summary of Item Timing by Grade Band and Item Difficulty 
(Cie-le(-Msr-] ale 


i> eel 6 


Simple 104 115 


Summary 


lt was hypothesized that more complex items would take longer to complete than simpler items. No 
evidence was found to support this hypothesis. In terms of the time spent on an item, SR items were 
answered in the shortest time. “Hot text” items took about one minute longer than SR items. CR 
items took the most time to answer, about 75 seconds longer then the “hot text” items. 


Dik 


Smarter Balanced Cognitive Laboratories Technical Report 


Effective Communication of Mathematics 


Research Question 10: Working mathematics problems on computer: Communicating mathematics 
on computer—feasibility of measuring student understanding of items for Claims 2-4 on computer. 


With paper tests some students write in their test books while working out mathematics problems. 
When mathematics items are presented on computer, scratch paper is often provided if students 
want to transfer the problem to paper and work it out there. Because scratch paper is often 
destroyed after an online testing session, the degree to which scratch paper is used is not known; 
neither is the importance of scratch paper in working out a problem (or potentially for use in scoring). 
This research question examines the need for paper when solving mathematics problems. Forms 
were constructed at four grade bands: grade band 3 and 4 (referred to as grade band 3), grade band 
6 and / (referred to as grade band 6), grade band 7 and 8 (referred to as grade band 7), and grade 
band 11 (referred to as grade band 11) to investigate whether the scratch paper usage was uniform 
or varied by educational level. 


Each student was presented with three grade-appropriate items. The interviewer recorded whether 
the student made a comment, and the nature of the comment, while working the mathematics 
problems. The students first tried to work the problem without paper. Scratch paper was then offered 
to the student to rework the problem, if desired. The interviewer noted whether students chose to 
add anything additional and noted the nature of the addition (more text, equations, graphics). Note 
that there were only three comments for the third item in the lowest grade band, 3. 


Results 


Twenty students were administered the grade band 3 form, 37 students were administered the 
grade band 6 form, 21 students were administered the grade band 7 form, and 19 students were 
administered the grade band 11 form. 


Table 35 shows the percentage of comments made for an item and the type of comment made. Two 
types of comments were of interest: did the students who wanted paper draw a picture or write an 
equation or did they find the online system difficult to use. The lowest grade band students (grade 
band 3) did not need paper to solve any of the problems (Table 635. Some students in the highest 
grade band (grade band 11) commented that they would like to draw a picture for the items they 
were administered (15-30 percent). (“Il wanted to graph the area.”) There was also one item (Item 2) 
for which about 15 percent of students wanted paper to write equations. About 5-10 percent of 
students in each grade band found the online system difficult to use. (“Confused me, | didn’t know 
how to write an equation,” “Tried the keypad, but it wouldn't work,” “It was much easier with paper.”) 
The strongest result came from the grade band 6 and grade band 7 groups, where 30 to 42 percent 
of the sample, respectively, indicated that they wanted to write an equation. Between 3 and 23 
percent of the grade band 6 and / groups also indicated that they wanted to draw a picture. This 
may be a function of newly introduced algebra concepts for this group. 


77 hh 


58 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 35. Percentage of Comments for an Item, by Question Type and Grade Band 


CT ese (sm ete) ale| 


a 
2 


a a Se a a 
ee ee ee ee ee 


- 


lel et 


= 


kK» | ® 


a 
ewan 


-» 


Table 36 shows the nature of the student comments made on paper and whether the additional 
information recorded on the paper improved the response according to the rubric. For all grade 
bands the additional information recorded on the paper included a graphic. In grade bands 6, 7, and 
11, the additional information recorded on paper included an equation. The grade band 6, 7, and 11 
groups provided additional information on paper that improved the response according to the rubric. 
For example, one administrator noted, “When given paper, she was able to do the proper equation 
and solve for x. She was more confident with paper and pencil.” The number of cases in which 
improvement was observed varied by item. For grade band 6, item 2, about 11 percent of the 
responses were improved when scratch paper information was taken into account during scoring. 
For grade band 11, item 3, about 16 percent of the responses were improved when scratch paper 
information was taken into account during scoring. Responses to all items in grade band 7 were 
improved when scratch paper information was taken into account. The improvement for this group 
ranged between 10 and 20 percent across items. 


59 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 36. Percentage of Changes Made When Paper Was Introduced 


(CT es\e(sm tel ale| 


ee 
ewe 
ee ee 
ee 


Addition Improved Response 
According to Rubric 


The interviewer’s comments suggested that most students in grade band 3 (75 percent) and grade 
band 11 (63 percent) were able to accurately respond to the mathematics items they saw only using 
the online text editor. However, fewer than half of the students in grade band 6 (45 percent) could 
accurately respond to questions using only the text editor and only 13 percent of the students in 
grade band / were observed to be able to accurately respond to questions using only the text editor. 
One student commented, “It’s much easier with paper.” 


Summary 


The general conclusion is that a subset of students benefit from being able to work mathematics 
problems on paper. It appears to be especially important when students are beginning to learn 
algebra concepts. 


Grade band 3 students did not need paper to work the problems. However, in the grade band 6 and 
grade band 7 groups, 30-42 percent of students indicated that they wanted to write an equation. In 
grade bands 6, 7, and 11, the additional information recorded on paper would have improved the 


60 


Smarter Balanced Cognitive Laboratories Technical Report 


response according to the rubric. Responses for specific items in grade bands 6 and 11 were 
improved by 15 percent of the students and responses for all items in grade band 7 were improved 
when information on the scratch paper was taken into account. Improvement for this group ranged 
between 10 and 20 percent of the responses. This was supported by interviewer observations. About 
5-10 percent in each grade band found the online system difficult to use, but few specifics were 
recorded. 


Research Question 11: Usability of equation editor tool—can students use the too! the way It Is 
meant to be used? 


Although students begin to use technology at a very early age, it is prudent to verify that young 
students are able to use the assessment interface to be used during testing. This question sought to 
evaluate the ability of grade 3-5 students to use the equation editor tool to be included in the 
Smarter Balanced delivery system. Three mathematics items were presented to the students (N=33). 
The first item only required the student to copy his or her response. The second item was a simple 
mathematics item and the third item was a more challenging mathematics item. The first item would 
demonstrate whether the student could use the equation editor tool. The second and third items 
would provide evidence of whether the ability to use the tool interacted with item difficulty. 


Results 


Between 15 and 30 percent of the students indicated that they had difficulty using the equation 
editor. About 30 percent had trouble just copying the answer, as required by item 1. The examiners 
assessed that 35 percent had difficulty using the equation editor and that only 40-57 percent of the 
students would get a given item correct. Students had more difficulty with the more challenging 
items. ASummary of representative comments made by students about the equation editor during 
the administration of the think-aloud protocol is presented below: 


Clicked on the + sign, but it didn’t work, twice. 

How do | choose the numbers? 

| needed paper to make a picture. 

How do | use the number pad? 

| tried to use the numbers on the keyboard, but wouldn't work. 
Some symbols didn’t respond to first click. 

| had trouble getting bottom half of fraction to record. 
Unclear what possible value meant. 

. | didn’t see decimal point down there [due to scrolling]. 
10. Couldn’t find x symbol. 

11. Unclear whether to click and drag or type. 

12. Would rather type than use a mouse. 

13. Difficult to use fraction tool. 


ON OP ON es eu 


Summary 


Elementary students had some difficulty using the equation editor. Between 15 and 30 percent of 
the students indicated that they had difficulty using the equation editor. The examiner’s assessment 


61 


Smarter Balanced Cognitive Laboratories Technical Report 


concurred that about 35 percent had difficulty using the equation editor and that about 50 percent 
of the students would get a given item correct. 


Research Question 12: Intuitive understanding of the relationships in multiplying fractions. 


This question is designed to assess whether students with a strong understanding of fractions and 
the multiplication and division of fractions complete the items without performing the indicated 
multiplication. The task asked students to compare the size of a product to the size of one factor, on 
the basis of the size of the other factor, without performing the indicated multiplication. Also of 
interest was whether students who complete an item as intended (without using multiplication) 
spent less time on an item than those who did not. To investigate this question a single form was 
administered for grades 3-5. 


Results 


The form was administered to 33 students at the elementary level. Table 37 compares those with a 
strong understanding of fractions with those who do not have a strong understanding of fractions 
and whether they completed the task with or without using multiplication. There does not appear to 
be a relationship between strength of understanding of fractions (multiplication and division) and 
whether they used multiplication to solve the problems. 


Table 37. Strength of Understanding of Fractions and Whether Multiplication was Performed 


Yo} mo jdco)alem Ol alel-)e-ir-)alellaleme) i Sido) atom Olalel=le-ir-Valell alee) 
melee) ats mcled(e)ats 


Item Number | Performed Did not Performed Did not 
Multiplication | Perform Multiplication | Perform 
Multiplication Multiplication 


Table 38 presents descriptive statistics for the timing of each item (in seconds). In addition to means, 
medians are reported because timing distributions tend to be highly skewed. On average, those who 
did not have to perform the multiplication completed the items in less time. The results for item 6 
were comparable for the two groups. 


62 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 38. Comparison of the Time to Complete the Item for Those Who Did not Use Multiplication to 
Solve the Item and Those Who Did 


Performed Multiplication DYCoM akoym m=) aceena mi /(0litie)icer-laleye 


136 136 | 90 | 144 
10 


110 | 89 | 30-336 
123 | 114 | 70 |24-480| 88 | 69 | 57 | 25-195 
(95 | 28-480| 79 | 67 | 68 | 9-485 


Table 39 shows the percentage of students answering the item correctly. The students tested 
generally found the items to be difficult. (“Multiplying fractions was hard.”) Some students did not 
understand the inequality signs, while others did not understand improper fractions or how to make 
a whole number into a fraction. One interviewer commented that the “student had little or no 
understanding of fractions.” 


Table 39. Percentage of Students Answering an Item Correctly. 


ltem 
Number Percent 


5s | 26 
—e [ 33 


About 69 percent of the students used multiplication to solve the problems (Table 40). Student 
comments support this. “I multiplied... each box and put them in the correct boxes (columns).” “I 
timesed [sic] the numbers.” “| looked at each number expression and multiplied it in my head and 
moved it to where | thought it was right.” “Some numbers on the bottom depends on the top number 
which is bigger or smaller.” Only about 40 percent of the students understood fractions or at least 
the multiplication of fractions. The examiner’s comments (Table 41) concur with this conclusion. 


63 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 40. Percentage of Students Using Multiplication to Solve the Items 


Item Number 


Table 41. Interviewer’s Assessment of: (1) Whether the Student Used Multiplication and (2) Whether 
the Student Had a Strong Understanding of Fractions 


Summary Percent 


Did student use multiplication? 


Did student have a strong understanding of fractions 
t cyteee a 40 
(multiplication/division)? 


Summary 


There seemed to be little relationship between whether a student has a strong understanding of the 
multiplication and division of fractions and whether he or she used multiplication to solve the items. 
However, students who did not have to perform the multiplication completed the items in less time 
than students who had to perform the multiplication. While most students said they understood the 
questions, 7O percent had to use multiplication to solve them. Only about 40 percent of the students 
had a firm understanding of the multiplication/division of fractions, according to the interviewers. 


Special Populations 


Research Question 13: Contextual glossaries are item-specific glossaries that provide a definition of 
a word that is targeted to, and appropriate for, the context in which the word is used in the item. Are 
these a fair and appropriate way to support students who need language support? 


This question addressed the efficacy of the use of contextual glossaries with non-native (Spanish) 
Speakers (see Exhibit 4 for an example of a contextual glossary item) when solving mathematics 
problems. A contextual glossary item contains highlighted words when presented online. Clicking any 
of these highlighted items produces a list of all highlighted words in the item with Spanish definitions 
for each. Two sets of items were created that were parallel in difficulty. The first set of items 
contained no contextual glossaries with only single words translated. The second set of items 


64 


Smarter Balanced Cognitive Laboratories Technical Report 


contained contextual glossaries. The interviewer was asked to determine whether the student was 
having trouble understanding a word and whether the contextual glossary aided in the 
interpretation of the word or sentence. 


Only three ELL students participated: one from grade 3 and two from grade 6. 


65 


Smarter Balanced Cognitive Laboratories Technical Report 


Exhibit 4. Example of a Contextual Glossary Item 


1. A roller coaster has a large rise and drop followed by a complete circle. The following diagram 
shows measurements for the track. An extra 20 feet are needed for cutting and welding. How many 
feet of track should be ordered? (Use m1 = 3.14) 


A. 280 feet 
B. 407 feet 
C. 415.6 feet 


D. 1,537.4 feet 


Santen — 7 ey, Wy ieee | 
{. INCCIVK/ \NAJINNN\A/ 
Glossal ¥Y VWVINQOW 


Roller coaster 
montana rusa 


Rise 

subida 
Drop 

bajada 

caida 
Complete 

completo 

entero 
Diagram 

diagrama 

quema 

grafico 
Track 

via 

riel 
Cutting 

cortar 
Welding 

cortar 


66 


Smarter Balanced Cognitive Laboratories Technical Report 


Results 


The grade 3 student had trouble understanding a few items, but had few word confusions. For the 
second set of items, this student used the contextual glossaries for one item but not for the other 
items. The student said that there was not a problem understanding the items because the student 
used “sentence context” to answer them, or the words the student didn't know weren't in the 
glossary so the student stopped using it. In terms of scoring, this student answered two of the three 
“translated” items correctly, but did not answer any of the “contextual glossary” items correctly, so 
the results are difficult to interpret as to whether the use of contextual glossaries aided the students’ 
performance. 


The two grade 6 students (one ELA form and one Math form) both had difficulty with the “translated” 
items in the first set with six or more word confusions each for most items. Both students found the 
contextual glossary useful to some degree, though not for all items. (“The words | don’t know aren’t 
in the glossary.”) However, the interviewers suggested that the use of the contextual glossary 
improved the performance for both grade 6 students. Though the ELA student got all questions 
incorrect, the interviewer believed that this was mainly due to careless mistakes and that the 
student used the glossary to help make sense of the key components of the questions and 
understood the procedures for answering the questions. The math student got two-thirds of the 
items correct when the items were translated, and one-third of the items correct when the contextual 
glossary was used. The student had difficulty understanding an essential word in one of the incorrect 
items. However, the interviewer commented that once he understood the words, he could confidently 
work on the problem and he knew how to proceed. 


Summary 


In summary, contextual glossaries appeared to be somewhat effective when they were used, but the 
impact was not always reflected in the score the student received for an item. The contextual 
glossaries appeared to be incomplete in that they did not include words that the students needed. 
This limited the use of the glossaries in these situations. Interviewer’s comments suggested that 
performance was improved when the students used the contextual glossaries. 


67 


Smarter Balanced Cognitive Laboratories Technical Report 


Research Question 14: Under what conditions do students with lower reading ability use text-to- 
speech (TTS) to help focus on content in ELA and mathematics? Is this affected by the quality of 
the voice-pack? 


TTS is a technology that can give students with low reading ability access to an assessment. For this 
technology to be effective the language produced from the voice-pack must be clear so that it can be 
understood. This is particularly true for non-native speakers of English. 


This question is designed to assess whether students with lower reading ability and non-native 
speakers of English use TTS to help focus on content in ELA and mathematics. Only students familiar 
with TTS were included in the study. Overall, 77 students used TTS at least once. Among them, 58 
students are LEP students, 13 students had reading difficulties (IEP), and six students were Gen Ed 
students. 


Forms were constructed at three grade bands: grade band 3 (referred to as grade band 3), grade 
band 6 and / (referred to as grade band 6), and grade 11 (referred to as grade band 11). In ELA, 
four forms were administered with both high- and low-quality voice-packs. In mathematics, two forms 
were administered in grade bands 3 and 11. Only a single form was administered in grade band 6. 
For all mathematics forms only high-quality voice-packs were administered. In Tables 42-45, yellow 
shading denotes the use of high-quality voice-packs while a white background denotes the use of a 
low-quality voice-pack. 


Results 


For ELA (Table 42), for all groups and grade bands, a high percentage of students tended to make 
comments indicating an improved focus on the content when the voice-pack was of high quality. 
About one-third of the students (except the Gen Ed grade band students) indicated that TTS kept 
their focus on content even when low-quality voice-packs were used. For ELA, students in all groups 
tended to make greater use of TTS when the voice-pack was of high quality. 


About 50 percent of the LEP students in mathematics in grade bands 3 and 11 made comments 
indicating that TTS helped them focus on content. All of the LEP grade band 6 group and the IEP 
students in grade band 3 found that TTS helped them focus on content. (“It made me think about 
the question.”) The Gen Ed students in grade band 3 found that TTS helped them focus on content; 
however, the Gen Ed grade band 6 students did not find TTS useful. 


68 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 42. Percentage of TTS Students Who Made Any Comment Indicating That He/She Is Mainly Focused on the Content of the Item, by 
Content, Voice-Pack Quality, Sample, and Grade Band 


LEP Gen Ed 


Voice Pack 
Content Grade Band 3 


69 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 43 shows the percentage of students who answered the items correctly, averaged across items. In ELA, the grade band 6 and 11 
LEP students and the grade band 3 IEP students found the items more difficult using a low-quality voice-pack. The Gen Ed grade band 6 
ELA students were not administered a high quality voice-pack. In the LEP grade band 6 group, about half the students answered an item 
correctly using the high-quality voice-pack. The percentage answering an item correctly was close to 75 percent for the other LEP grade 
bands and the grade band 3 low-level reading students when the high-quality voice-pack was used. 


In mathematics, in grade band 3, about 40 percent of the LEP students answered an item correctly. For the other grade bands, for the LEP 
and IEP samples, no items were answered correctly, even with the high-quality voice-packs. This was also true for the Gen Ed grade band 3 
students. However, the general education students in grade band 6 answered all the items correctly. 


Table 43. Percentage of TTS Students Who Answered the Items Correctly by Content, Voice-Pack Quality, Sample, and Grade Band 


Mathematics 


70 


Smarter Balanced Cognitive Laboratories Technical Report 


Tables 44 and 45 summarize the interviewer’s assessment for ELA and mathematics related to whether TTS improved access to the 
content or was a distraction. TTS improved access in ELA regardless of the quality of the voice-pack. Greater access was achieved when 
high-quality voice-packs were used in ELA except in grade band 11. This is probably an artifact of the very small sample size. The low-quality 
voice-pack appeared less effective at providing access and was distracting in ELA, where the high-quality voice-pack was not distracting at 
all. One student said, “[I] didn’t like using TTS ... the sound was robotic and would break my concentration.” 


In mathematics, TTS helped to improve access for some grade band 3 LEP students, but not for middle- and upper-level LEP students or the 
IEP or Gen Ed grade band 3 students. All the Gen Ed, IEP, and grade band 6 LEP students found the high-quality voice-pack distracting in 
mathematics. This was in part a function of trying to describe a table verbally. (“When TTS read the chart aloud, | got lost in the numbers 
and couldn’t figure out what the question was asking.”) 


Table 44. Assessment by the Interviewer of the Percentage of TTS Students Whose Access to Content Was Improved by the Use of TTS by 
Content, Voice-Pack Quality, Sample, and Grade Band 


LEP 


Voice Pack 
Content Grade Band 3 
57 
ELA 


Mathematics 


71 


Smarter Balanced Cognitive Laboratories Technical Report 


Table 45. Assessment by the Interviewer of the Percentage of TTS Students Who Were Distracted by TTS, by Content, Voice-Pack Quality, 
Sample, and Grade Band 


Voice Pack 
Content Quality 


Mathematics 


qa 


Smarter Balanced Cognitive Laboratories Technical Report 


Summary 


TTS improved access in ELA regardless of the quality of the voice-pack. Greater access was achieved 
when high-quality voice-packs were used. LEP students and students with reading difficulties tended 
to benefit more from the use of TTS. Using TTS with high-quality voice-packs improved focus on 
content in ELA. The use of TTS with low-quality voice-packs tended to distract students in ELA, 
whereas high-quality voice-packs did not. In mathematics, access was improved only for grade 

band 3 students. All the Gen Ed, IEP, and grade band 6 LEP students found the high-quality voice- 
pack distracting. This was in part a function of trying to describe a table verbally. 


Final Summary 


Smarter Balanced is moving toward an assessment model that is largely scored automatically and 
delivered adaptively on computer. The Smarter Balanced cognitive laboratories were conducted to 
investigate questions that arise from such an automated design. While think-aloud protocols are 
time consuming, they have the potential to provide a level of information not easily accessed through 
large-scale studies. However, the sample sizes are small. Therefore, should a more rigorous 
investigation of any of the research questions be of interest, specifically designed studies with large 
samples will be needed. 


This report presents the results from 14 small think-aloud studies that addressed topics that pertain 
to an automated test delivery system. 


1. Can non-constructed-response item formats assess components that have historically been 
believed to be measured only with CR items? 

2. What is the optimal amount of direction to provide for TE items? Does this vary with grade 
level? 

3. What is the appropriate degree of labeling to provide for MPSR items so that students know 
to complete all parts? 

4. Does it matter whether items associated with a passage are presented in a single block or 
presented one item at a time? Are ELL students impacted by these different arrangements? 

5. Do the longer passages favored by Smarter Balanced reduce student engagement? 

6. How much time do items in different formats take to answer? Are ELL students affected 
more than general education students? 

¢. In mathematics, could information captured on scratch paper facilitate the working of a 
problem and benefit the performance and scoring of a student? 

8. Do contextual glossaries help improve the performance of students with language 
disabilities? 

9. Does TTS help focus students of low reading ability on the content of an item? 

10. Can younger students effectively use the equation editor? 

11. Mathematics intuition: Can students compare the size of a product to the size of one factor, 
on the basis of the other factor without multiplying? 


On the whole, the cognitive laboratories were successful in providing answers to most of these 
questions. They provide a glimpse of issues that may exist and need to be investigated further. To 
investigate these issues more completely, larger-scale studies should be conducted. 


72 


Smarter Balanced Cognitive Laboratories Technical Report 


Appendix A 


Question 2. Full Claim Descriptions 


Content COTE: I Claim Description 


Grade 


Students can read closely and analytically to comprehend a range 
of increasingly complex literary and informational text. 


Students can produce effective writing for a range of purposes 
and audiences. 


Students can employ effective speaking and listening skills for a 
range of purposes and audiences. 


Students can engage in research/ inquiry to investigate topics 


coe and to analyze, integrate, and present information. 
FLA Students can read closely and analytically to comprehend a range 
of increasingly complex literary and informational texts. 
FLA Students can produce effective writing for a range of purposes 
and audiences. 
FLA Students can employ effective speaking and listening skills for a 
range of purposes and audiences. 
Students can engage in research/ inquiry to investigate topics 
ELA 
and to analyze, integrate, and present information. 
Students can read closely and analytically to comprehend a range 
ELA 9-12 ; 
of increasingly complex literary and informational texts. 
Students can produce effective and well-grounded writing for a 
ELA 9-12 
range of purposes and audiences. 
Students can employ effective speaking and listening skills for a 
ELA 9-12 
range of purposes and audiences. 
Students can engage in research/inquiry to investigate topics, 
ELA 9-12 ; 
and to analyze, integrate, and present information. 
Students can explain and apply mathematical concepts and 
Math 3-5 interpret and carry out mathematical procedures with precision 
and fluency. 


74 


Smarter Balanced Cognitive Laboratories Technical Report 


Content 


Content Grade 


Claim Description 


Students can solve a range of well-posed problems in pure and 
applied mathematics, making productive use of knowledge and 
problem-solving strategies. 


Math 3-5 


Students can clearly and precisely construct viable arguments to 
Support their own reasoning and to critique the reasoning of 
others. 


Math 


Students can analyze complex, real-world scenarios and can 
construct and use mathematical models to interpret and solve 
problems. 


Math 


Students can explain and apply mathematical concepts and carry 


Math out mathematical procedures with precision and fluency. 


Students can solve a range of well-posed problems in pure and 
applied mathematics, making productive use of knowledge and 
problem-solving strategies. 


Math 


Students can clearly and precisely construct viable arguments to 
Support their own reasoning and to critique the reasoning of 
others. 


Math 


Students can analyze complex, real-world scenarios and can 
construct and use mathematical models to interpret and solve 
problems. 


Math 


| # 
Ol Ol 


Students can explain and apply mathematical concepts and carry 


Math oo. out mathematical procedures with precision and fluency. 


Students can solve a range of well-posed problems in pure and 
Math 9-12 applied mathematics, making productive use of knowledge and 
problem-solving strategies. 


Students can clearly and precisely construct viable arguments to 
Math 9-12 Support their own reasoning and to critique the reasoning of 
others. 


Students can analyze complex, real-world scenarios and can 
Math 9-12 construct and use mathematical models to interpret and solve 
problems. 


75 


Smarter Balanced Cognitive Laboratories Technical Report 


Question 2. Full Target Descriptions 


Grade Grade : er 
Content Content | Band DOK Claim Target Target Description 


Write or revise one or more 
informational/explanatory paragraphs 
demonstrating ability to organize ideas by 

ELA 3 3 2 2 3 stating a focus, including appropriate 
transitional strategies for coherence, or 
supporting details, or an appropriate 
conclusion. 


Use supporting evidence to justify 
interpretations or analyses of information 
presented or how information is integrated 
within a text (point of view; interactions among 
events, concepts, people, or ideas; author’s 
reasoning and evidence). 


ELA 2 1 11 


Use supporting evidence to justify 
interpretations or analyses of information 
presented or how information is integrated 
within a text (point of view; interactions among 
events, concepts, people, or ideas; author’s 
reasoning and evidence). 


ELA 3 1 11 


Apply a variety of strategies when writing or 
revising one or more paragraphs of 
informational/explanatory text organizing ideas 
by stating and maintaining a focus/tone, 

ELA { 2 2 3 providing appropriate transitional strategies for 
coherence, developing a topic including relevant 
supporting evidence/vocabulary and 
elaboration, or providing a conclusion 
appropriate to purpose and audience. 


ELA 7 7 5 4 4 Identity explicit textual evidence to support 
inferences made or conclusions drawn. 
Identify explicit textual evidence to support 
v6 2 1 1 
inferences made or conclusions drawn. 


Apply a variety of strategies when writing or 
revising one or more paragraphs of 
informational/explanatory text organizing ideas 
by stating and maintaining a focus/tone, 

ELA { 2 2 providing appropriate transitional strategies for 
coherence, developing a topic including relevant 


supporting evidence/vocabulary and 
elaboration, or providing a conclusion 
appropriate to purpose and audience. 


Cite explicit textual evidence to support 
ELA 11 11 2 1 1 inferences made or conclusions drawn about 
texts. 


76 


Smarter Balanced Cognitive Laboratories Technical Report 


Grade Grade 
Content Band 


Determine or analyze the figurative (é.g., 
euphemism, oxymoron, hyperbole, paradox), or 
ELA 11 11 2 1 { connotative meanings of words and phrases 

used in context and the impact of those word 
choices on meaning and tone. 

MATH Develop understanding of fractions as numbers. 

MATH A 3 1 r Extend understanding of fraction equivalence 
and ordering. 

MATH A A 1 1 Draw and identify lines and angles, and classify 
Shapes by properties of their lines and angles. 

MATH A A A A Apply mathematics to solve problems arising in 
everyday life, society, and the workplace. 


Content DOK Claim Target Target Description 


MATH 5 4 ; Apply and extend previous understandings of 
arithmetic to algebraic expressions. 
Apply mathematics to solve well-posed 

MATH 3 2 A problems arising in everyday life, society, and 
the workplace. 

MATH 11 11 5 1 solve equations and inequalities in one 
variable. 
Apply mathematics to solve well-posed 

MATH 11 11 2 2 A problems arising in everyday life, society, and 
the workplace. 
Apply mathematics to solve well-posed 

MATH 11 11 3 2 A problems arising in everyday life, society, and 
the workplace. 


fa 


Smarter Balanced Cognitive Laboratories Technical Report 


Appendix B 
Demographic Information for Cognitive Laboratories 
Total Number of Students: 774 


By Cognitive Lab Location: 

San Francisco, California: 80 (10%) 
Monterey, California: 167 (22%) 
Waterbury, Connecticut: 45 (6%) 
Hartford, Connecticut: 26 (3%) 
Pocatello, Idaho: 64 (8%) 

District of Columbia: 31 (4%) 
Honolulu, Hawaii: 43 (6%) 

East Lansing, Michigan: 63 (8%) 
Madison Heights, Michigan: 33 (4%) 
Marquette, Michigan: 30 (4%) 

Des Moines, lowa: 52 (7%) 
Pittsburgh, Pennsylvania: 76 (10%) 
Columbia, South Carolina: 50 (6%) 
Portland, Oregon: 14 (2%) 


By School Location: 
California: 243 (31%) 
Connecticut: 71 (9%) 
District of Columbia: 14 (2%) 
Hawaii: 43 (6%) 

Idaho: 64 (8%) 

lowa: 52 (7%) 
Maryland: 12 (2%) 
Michigan: 126 (16%) 
Nevada: 4 (<1%) 
Oregon: 12 (2%) 
Pennsylvania: 76 (10%) 
South Carolina: 50 (6%) 
Virginia: 5 (<1%) 
Washington: 2 (<1%) 


By Grade: 

Grade 3: 113 (15%) 
Grade 4: 100 (13%) 
Grade 5: 79 (10%) 
Grade 6: 98 (13%) 
Grade 7: 113 (15%) 
Grade 8: 62 (8%) 


78 


Grade 9: 87 (11%) 
Grade 10: 70 (9%) 
Grade 11: 44 (6%) 
Grade 12: 8 (1%) 


By Gender: 
Male: 393 (51%) 
Female: 381 (49%) 


Language(s) Spoken at Home: 


English: 670 (87%) 
Spanish: 100 (13%) 
Chinese: 46 (6%) 
Chaldean: 21 (3%) 
Arabic: 18 (2%) 
Albanian: 15 (2%) 
Tagalog: 10 (1%) 
German: 5 (<1%) 
Vietnamese: 5 (<1%) 
Hindi: 4 (<1%) 
Korean: 4 (<1%) 
Japanese: 3 (<1%) 
Samoan: 3 (<1%) 
Bengali: 2 (<1%) 
Greek: 2 (<1%) 
llocano: 2 (<1%) 
Telegu: 2 (<1%) 
Other: 14 (2%) 


Smarter Balanced Cognitive Laboratories Technical Report 


*Total percentage is more than 100% because more than one response could be selected. 


Language(s) Most Frequently Spoken: 


English: 707 (91%) 
Arabic: 22 (3%) 
Chinese: 18 (2%) 
Chaldean: 16 (2%) 
Spanish: 13 (2%) 
Albanian: 3 (<1%) 
Greek: 2 (<1%) 
Tagalog: 2 (<1%) 
Other: 7 (1%) 


*Total percentage is slightly over 100% because some parents added an additional language in the 


comment section. 


79 


Smarter Balanced Cognitive Laboratories Technical Report 


Type of School: 

Public: 681 (88%) 
Private: 42 (5%) 
Charter: 18 (2%) 
Home School: 14 (2%) 
Parochial: 13 (2%) 
Other: 4 (<1%) 


Access to a Computer at Home: 
Yes: 747 (97%) 
No: 27 (3%) 


Frequency of Computer Use: 

Almost every day or every day: 438 (57%) 
Three or four times per week: 175 (23%) 
Once or twice per week: 146 (19%) 
Never: 15 (2%) 


Frequency of Internet Use: 

Almost every day or every day: 401 (52%) 
Three or four times per week: 189 (24%) 
Once or twice per week: 166 (21%) 
Never: 18 (2%) 


Computer Classes: 
Yes: 385 (50%) 
No: 321 (41%) 
Unsure: 68 (9%) 


IEP: 

Yes: 87 (11%) (e.g., ADHD, Dyslexia, Emotional Disturbance, Gifted, Hearing Loss, High Functioning 
Asperger’s, Impaired/Slow Learning, Auditory Processing Disability, Orthopedic Impairment, Speech 
and Language, Speech Impairment) 

No: 631 (82%) 

Unsure: 56 (7%) 


Testing Accommodations: 

Yes: 83 (11%) (e.g., Paper Test, Printable Test, Student can take test in another language, ELD, 
Limited English Proficiency, Listen to questions on tape and use bilingual dictionary, Supervised 
breaks and additional time, Assessments can be read, Assessments one on one with administrator, 
Cantonese Bilingual Pathway Instruction, Extra time and modified questions, Extended response 
time, Separate room) 

No: 647 (84%) 

Unsure: 42 (5%) 


80 


Smarter Balanced Cognitive Laboratories Technical Report 


There is no assessment program at this grade level: 1 (<1%) 
Child does not participate in the school’s testing or assessment program: 1 (<1%) 


ELA Grades: 

Above Average: 375 (48%) 

Average: 324 (42%) 

Below Average: 51 (7%) 

Unsure: 20 (3%) 

*Not all participants responded to this question. 


Mathematics Grades: 

Above Average: 392 (51%) 

Average: 311 (40%) 

Below Average: 55 (7%) 

Unsure: 14 (2%) 

*Not all participants responded to this question. 


Ethnic/Cultural Breakdown: 

White: 493 (64%) 

Hispanic: 137 (18%) 

Asian: 125 (16%) 

Black/African American: 76 (10%) 

Native Hawaiian or Other Pacific Islander: 28 (4%) 
American Indian or Alaskan Native: 17 (2%) 
Filipino: 12 (2%) 

Asian Indian: 5 (<1%) 

Other: 3 (<1%) 

*Total percentage is over 100% because more than one response could be selected. 


Household Income: 

Under $25,000: 135 (17%) 

Between $25,001 and $50,000: 170 (22%) 
Between $50,001 and $75,000: 139 (18%) 
Between $75,001 and $100,000: 145 (19%) 
Between $100,001 and $150,000: 110 (14%) 
Over $150,001: 54 (7%) 


8 1 


Appendix G— ___ Usability, Accessibility, and Accommodations Guidelines 


Page 25 of 39 


Smarter 
Balanced 


Assessment Consortium 


Smarter Balanced 
Assessment Consortium: 
Usability, Accessibility, and 
Accommodations Guidelines 


Prepared with the assistance of the 
National Center on Educational Outcomes 


August 1, 2014 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Table of Contents 


PENN OG OV soc cen ecg aces get ec en ere ee eee ne eee ce oe eee eae eee en ecg ge eee eee eee ee 1 
Intended Audience and RECOMMENAE USC ..........ccccceeseceeeceeeeeeeeeeeeeeeeeesaeseeeeeesaeaseeeesaueeeeeessaaeees 1 
Smarter Balanced Assessment DeSIQN ...........::.::ssssscccecsseseececessseeeeccccsseeeeccccssseeeccsseteecccesseseesceess 2 
Recognizing Access Needs in All StUCGENTS...............::ccceseseeeeccceceseecccesseseecccsseesesecccesseeccecssereecceessss 2 
SUPUCTUNG OF WIS DOC UNC as aeesetnecttccncteretncsetececdesacetpescataesueceteteneetnetceacamnenccecetmnmatetecocsetouiadatenen ices 5 

Section I: Smarter Balanced Universal TOols ............cccccesseeeeeeeeeeeeeeeeeeseesneeeeeesneeeseeeneeesaseeesenasaeesenennensens 6 
WVhatAre UnIVGrGall TOOlS 2 siesscccsawscnaneseeceesse ca casicdannocevewscssadsceecenssocuasvadsocucabeaestactsccusstancnmsvesecuaeanntes 6 
ERIC CCG WIIVENSall, OOS g:tececcsetececotute sacects-anssnctecqracetueenatene-gasceesennceSveasoeetasecateessaradtene-ceasennecosete! 6 
NON-EMDEGdER UNIVEFSAl 1 OOIS siessessieenccedidewncadeesecotsdanvanavsesdsiadacesneeteceasedaceussecndenastecedesneubeustneusdene: 8 

Section Il: Smarter Balanced Designated SUPPOMTtS............cccccceceeessseeceeseeeeeeseeseeeseeuseeeseseaseeeseseseeeeseeeees 9 
What Are DeSiGNaled SUDDOMS ? wise ciscaccicusCecsnccznauedcadecansannsnawe anv eeawudancauedaandcesosaisaesesateerwadimdecesaanesans 9 
Who Makes Decisions About DeSignated SUPPOMTtS? ...........cccccesseeeeeeeseeeessseeeeeeseeeeeaseeseeaseeseneess 9 
EMDeEdded: DESIGNATE SUD DONS ide satdeccen sien scenstuncnceictaiscnsaedsuciepucntbdencbuscapiece auctccedsedepceusidectuccneesetant 9 
Non-embedded DeSigNated SUPPOSS ...........ccccccesseeceeesseeeessseeeeeaseeeeeaseeseeaaseeeeeeaeeseeageeseeagseseneases 11 

Section Ill: Smarter Balanced ACCOMMOATIONS ............:sccceeeeeeeeeeeeeeeeeseeneeeeeeeneeeseneneesseaeneeseaeneessnaeneess 14 
What Are ACCOMMOGAIONS ?sscasccieedzcccuccasssesnedececpvannsetxenaadsnuuassesnpnedec.vetacensrnwadeausvsaceaamevateceseoinenes 14 
Who Makes Decisions About ACCOMMOCATIONS? ..........:::ssceccceeceeeeeeeesceeeeeeeeeaeseeeceeeeeeseeeaeeeeeeeees 14 
EIDE OOS ACCOMMODATIONS aca seencese tetas esnceteseascencaaseretocnmnteeconcuccassencetucaanccasacaencadesacocatitumestiedegeace 15 
Non-eMmbedded ACCOMMOAATIONS...........ccceeseceeceeeeeseeeeseeeseeeeeeeeaseeeeeeeaeseeeeeessaseeeeeseaaseeeeesnaeeeeeeeess 17 

5S101 8 | 6 ok eee ee ee ee ee eee 20 

Appendix A: Summary of Smarter Balanced Universal Tools, Designated Supports, and 

PRC COMMON Sass areas eset essa teecee nee sere cseecseesesei ee sannesavee- canee anes se ee seeae eben scarce sae cena ee seseeeceaacee 21 

Appendix B: Research-based Lessons Learned about Universal Design, Accessibility Tools, and 

PRC CONTAIN OC ONS ceca ee ee cess ewan eee eer owes ees eee escent 22 
Who might benefit from accessibility features identified by AA-MAS research’? ..........:.:2ceee 22 
What changes can be made to test items and tests that do not change the construct being 
SS SS apitecree curse a tisfcninsets ar vce va oo saree steve me Sa des saten vias pane est sana ee moa eee 22 
What can test developers do to build on the lessons learned from AA-MAS research and 
VOTE ACTA OA aca cieeenmunadntecnttitee tenes sedate tasaastetutuhodstandossydeakdusetesaneduiatecailouextetedecedeiatstadsteiatiodedstake 23 

PROV USO aioe sein enacts np sa ntnn seanesenee sane vacebuaneeuean pans sauaeasnceaveuses ques eaauussnvasunnseanceniwanveaiessnedenroaesueanesneeees 35 


Usability, Accessibility, and Accommodations Guidelines il 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Introduction 


The Smarter Balanced Assessment Consortium (Smarter Balanced) strives to provide every student 
with a positive and productive assessment experience, generating results that are a fair and 
accurate estimate of each student’s achievement. Further, Smarter Balanced is building on a 
framework of accessibility for all students, including English Language Learners (ELLs), students with 
disabilities, and ELLs with disabilities, but not limited to those groups. In the process of developing 
its next-generation assessments to measure students’ knowledge and skills as they progress toward 
college and career readiness, Smarter Balanced recognized that the validity of assessment results 
depends on each and every student having appropriate universal tools, designated supports, and 
accommodations when needed based on the constructs being measured by the assessment. This 
document was developed for the Smarter Balanced member states to guide the selection and 
administration of universal tools, designated supports, and accommodations. 


The Smarter Balanced assessment is based on the Common Core State Standards (CCSS). Thus, the 
universal tools, designated supports, and accommodations that are appropriate for the Smarter 
Balanced assessment may be different from those that states allowed in the past. For the secure 
summative assessments, a state can only make available to students the universal tools, designated 
Supports, and accommodations that are included in the Smarter Balanced Usability, Accessibility, 
and Accommodations Guidelines. A member state may elect not to make available to its students, 
any universal tool, designated support, or accommodation that is otherwise included in the 
Guidelines when the implementation or use of the universal tool, designated support, or 
accommodation is in conflict with a member state’s law, regulation, or policy. 


These Guidelines describe the Smarter Balanced universal tools, designated supports, and 
accommodations available for the Smarter Balanced assessments at this time (See Appendix A). The 
specific universal tools, designated supports, and accommodations approved by Smarter Balanced 
may change in the future if additional tools, supports or accommodations are identified for the 
assessment based on state experience and research findings. The Consortium will establish a 
standing committee, including members from Governing States that will review suggested additional 
universal tools, designated supports, and accommodations to determine if changes are warranted. 
Proposed changes to the list of universal tools, designated supports, and accommodations will be 
brought to Governing States for review, input, and vote for approval. Furthermore, states may issue 
temporary approvals (i.e., one Summative assessment administration) for individual unique student 
accommodations. State leads will evaluate formal requests for unique accommodations and 
determine whether or not the request poses a threat to the measurement of the construct. Upon 
issuing a temporary approval, the State will send documentation of the approval to the Consortium. 
The Consortium will consider all state approved temporary accommodations as part of the annual 
Consortium accommodations review process. The Consortium will provide to member states a list of 
the temporary accommodations issued by states that are not Consortium approved 
accommodations. 


Intended Audience and Recommended Use 


The Smarter Balanced Assessment Consortium’s Usability, Accessibility, and Accommodations 
Guidelines are intended for school-level personnel and decision-making teams, particularly 
Individualized Education Program (IEP) teams, as they prepare for and implement the Smarter 
Balanced assessment. The Guidelines provide information for classroom teachers, English 


Usability, Accessibility, and Accommodations Guidelines 1 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


development educators, special education teachers, and related services personnel to use in 
selecting and administering universal tools, designated supports, and accommodations for those 
students who need them. The Guidelines are also intended for assessment staff and administrators 
who oversee the decisions that are made in instruction and assessment. 


The Smarter Balanced Guidelines apply to all students. They emphasize an individualized approach 
to the implementation of assessment practices for those students who have diverse needs and 
participate in large-scale content assessments. This document focuses on universal tools, 
designated supports, and accommodations for the Smarter Balanced content assessments of 
English language arts/literacy and mathematics (math). At the same time, it Supports important 
instructional decisions about accessibility and accommodations for students who participate in the 
Smarter Balanced assessments. It recognizes the critical connection between accessibility and 
accommodations in instruction and accessibility and accommodations during assessment. 
Professional development materials that support the Guidelines and this critical instruction- 
assessment link will be available in the Spring of 2014. The Guidelines also are supported by the 
Smarter Balanced Test Administration Manual. 


Smarter Balanced Assessment Design 


The Smarter Balanced Assessment Consortium has developed a system of valid, reliable, and fair 
next-generation assessments aligned to the CCSS in English language arts (ELA)/literacy and 
mathematics for grades 3-8 and 11. The system includes summative assessments for accountability 
purposes, optional interim assessments for local use, and formative tools and processes for 
instructional use. Computer adaptive testing technologies are used for the summative and interim 
assessments to provide meaningful feedback and actionable data that teachers and other 
stakeholders can use to help students succeed. For more information, visit 


www.smarterbalanced.org/smarter-balanced-assessments/. 


Recognizing Access Needs in All Students 


All students (including students with disabilities, ELLs, and ELLs with disabilities) are to be held to 
the same expectations for participation and performance on state assessments. Specifically, all 
students enrolled in grades 3-8 and 11 are required to participate in the Smarter Balanced 
mathematics assessment except: 


e Students with the most significant cognitive disabilities who meet the criteria for the 
mathematics alternate assessment based on alternate achievement standards 
(approximately 1% or fewer of the student population). 


All students enrolled in grades 3-8 and 11 are required to participate in the Smarter Balanced 
English language/literacy assessment except: 


e Students with the most significant cognitive disabilities who meet the criteria for the English 
language/literacy alternate assessment based on alternate achievement standards 
(approximately 1% or fewer of the student population). 

e ELLs who are enrolled for the first year in a U.S. school. These students instead participate in 
their state’s English language proficiency assessment. 


Federal laws governing student participation in statewide assessments include the Elementary and 
Secondary Education Act (ESEA) (reauthorized as the No Child Left Behind Act of 2001 - NCLB), the 


Usability, Accessibility, and Accommodations Guidelines 2 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Individuals with Disabilities Education Improvement Act of 2004 (IDEA), and Section 504 of the 
Rehabilitation Act of 1973 (reauthorized in 2008). 


Recognizing the diverse characteristics and needs of students who participate in the Smarter 
Balanced assessments, the Smarter Balanced states worked together through the Smarter Balanced 
Test Administration and Student Access Work Group to develop an Accessibility and 
Accommodations Framework that guided the consortium as it worked to reach agreement on the 
specific tools, supports, and accommodations available for the assessment. The Work Group also 
considered research-based lessons learned about universal design, accessibility tools, and 
accommodations (see Appendix B). 


The conceptual model that serves as the basis for the Usability, Accessibility, and Accommodations 
Guidelines is shown in Figure 1. This figure portrays several aspects of the Smarter Balanced 
assessment features - universal tools (available for all students), designated supports (available 
when indicated by an adult or team), and accommodations (available need is documented in an 
Individualized Education Program - IEP or 504 plan). It also portrays the additive and sequentially- 
inclusive nature of these three aspects. Universal tools are available to all students, including those 
receiving designated supports and those receiving accommodations. Designated supports are 
available only to students for whom an adult or team has indicated the need for these 
accommodations (as well as those students for whom the need is documented). Accommodations 
are available only to those students with documentation of the need through a formal plan (i.e., IEP). 
Those students also may use designated supports and universal tools. 


A universal tool for one content focus may be an accommodation for another content focus (see, for 
example, calculator). Similarly, a designated support may also be an accommodation, depending on 
the content target (See, for example, scribe). This approach is consistent with the emphasis that 
Smarter Balanced has placed on the validity of assessment results coupled with access. Universal 
tools, designated supports, and accommodations all yield valid scores that count as participation in 
statewide assessments when used in a manner consistent with the Guidelines. 


Also, as shown in Figure 1, for each category of assessment features - universal tools, designated 
Supports, and accommodations - there exist both embedded and non-embedded versions of the 
tools, Supports, or accommodations depending on whether they are provided as digitally-delivered 
components of the test administration system or separate from it. 


Usability, Accessibility, and Accommodations Guidelines 3 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


. Assessment Consortium 


— 


¥ 


Figure 1: Conceptual Model Underlying the Smarter Balanced Usability, Accessibility, and 
Accommodations Guidelines 


Embedded 
Breaks, Calculator, 
Digital Notepad, 
English Dictionary, 
English Glossary, 
Expandable Passages, 
Global Notes, 


DY=s) fetate| Coxe mel 0] 0) ele) ats 


Highlighter, 

Keyboard Navigation, 
Mark for Review, 
Math Tools, 

Spell Check, 
Strikethrough, 
Writing Tools, Zoom 


’ Embedded 
Color Contrast, 
Masking, 


. Text-to-speech, 
a ~ Translated Test CCO ) ) 6) ra | 6) iS 
Directions, 
; Translations (Glossary), 
eens Translations (Stacked), 
English Dictionary, Turn off Any Universal 


Scratch Paper, ee 
Thesaurus 


Non-embedded 


d illnevalDietonany, Embedded Non-embedded 
ec ee American Sign Language, Braille, Abacus, Alternate Response Options, 
ME a atta Closed Captioning, Text-to-speech Calculator, Multiplication Table, Noise 
SeHCauON; Buffers, Print on Demand, Read Aloud, 


Read Aloud, ie 
Scribe, Scribe, Speech-to-text 


Separate Setting, 
Translated Test 
Directions, Translation 
(Glossary) 


The Conceptual Model recognizes that all students should be held to the same expectations for 
instruction in CCSS and have available to them universal accessibility features. It also recognizes 
that some students may have certain characteristics and access needs that require the use of 
accommodations for instruction and when they participate in the Smarter Balanced assessments. 


These Guidelines present the current universal tools, designated supports, and accommodations 


adopted by the Smarter Balanced states to ensure valid assessment results for all students taking 
its assessments. 


Usability, Accessibility, and Accommodations Guidelines 4 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Structure of This Document 
This document Is divided into several parts: 
e Introduction: This section introduces the document and the conceptual model that is the 
basis for the universal tools, designated supports, and accommodations in the Guidelines. 
e Section I: This section features the Consortium’s universal tools. 


e Section Il: This section features the designated supports available on Smarter Balanced 
assessments. 


e Section lll: This section features the accommodations available on Smarter Balanced 
assessments. 


e Appendix A: This appendix provides a summary list of Smarter Balanced’s universal tools, 
designated supports, and accommodations. 


e Appendix B: This appendix describes lessons learned from research on universal design, 
accessibility tools, and accommodations. 


Usability, Accessibility, and Accommodations Guidelines 5 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Section |: Smarter Balanced Universal Tools 


What Are Universal Tools? 


Universal tools are access features of the assessment that are either provided as digitally-delivered 
components of the test administration system or separate from it. Universal tools are available to all 
students based on student preference and selection. 


Embedded Universal Tools 


The Smarter Balanced digitally-delivered assessments include a wide array of embedded universal 
tools. These are available to all students as part of the technology platform. 


Table 1 lists the embedded universal tools available to all students for computer administered 
Smarter Balanced assessments. It includes a description of each tool. Although these tools are 
generally available to all students, educators may determine that one or more might be distracting 
for a particular student, and thus might indicate that the tool should be turned off for the 
administration of the assessment to the student (See Section II - Designated Supports). 


Table 1. Embedded Universal Tools Available to All Students 


| Uy al \V(=) ey] Im Kote) | Description | 


Breaks The number of items per session can be flexibly defined based on the 
student’s need. Breaks of more than 20 minutes will prevent the student from 
returning to items already attempted by the student. There is no limit on the 
number of breaks that a student might be given. The use of this universal tool 
may result in the student needing additional overall time to complete the 
assessment. 


(for calculator-allowed allowed items when students click on the calculator button. This tool is 


items only) available only with the specific items for which the Smarter Balanced Item 


Specifications indicated that it would be appropriate. When the embedded 
(See Non-embedded , 
calculator, as presented for all students, is not appropriate for a student (for 
Accommodations for ee 
example, for a student who is blind), the student may use the calculator 
students who cannot use , De . ,; 
offered with assistive technology devices (Such as a talking calculator or a 
the embedded calculator) 
braille calculator). 


specific and is available through the end of the test segment. Notes are not 
saved when the student moves on to the next segment or after a break of 
more than 20 minutes. 


| An English dictionary may be available for the full write portion of an ELA 
performance task, pending contractual discussions. A full write is the second 
part of a performance task. The use of this universal tool may result in the 
student needing additional overall time to complete the assessment. 


| English Dictionary 


(for ELA-performance task 
full writes) 


| Grade- and context-appropriate definitions of specific construct-irrelevant 
terms are shown in English on the screen via a pop-up window. The student 
can access the embedded glossary by clicking on any of the pre-selected 
terms. The use of this accommodation may result in the student needing 
additional overall time to complete the assessment. 


| Calculator An embedded on-screen digital calculator can be accessed for calculator- 
| English glossary 


Digital notepad | This tool is used for making notes about an item. The digital notepad is item- 


Usability, Accessibility, and Accommodations Guidelines 6 


Smarter 


OT ayiY{s) ésy>] im Kexe) 


Balanced 


Assessment Consortium 


Usability, Accessibility, and Accommodations 
Guidelines 


Description 


Expandable passages 


Global notes 


(for ELA performance 
tasks) 


Highlighter 


Keyboard navigation 


Mark for review 


| Math tools 


Spell check (for ELA items) 
Strikethrough 


Writing tools 


Zoom 


Each passage or stimulus can be expanded so that it takes up a larger portion 
of the screen. 


Global notes is a notepad that is available for ELA performance tasks in which | 


students complete a full write. A full write is the second part of a performance 
task. The student clicks on the notepad icon for the notepad to appear. During 
the ELA performance tasks, the notes are retained from segment to segment 
so that the student may go back to the notes even though the student is not 
able to go back to specific items in the previous segment. 


A digital tool for marking desired text, item questions, item answers, or parts | 


of these with a color. Highlighted text remains available throughout each test 
segment. 


Navigation throughout text can be accomplished by using a keyboard. 


Allows students to flag items for future review during the assessment. 
Markings are not saved when the student moves on to the next segment or 
after a break of more than 20 minutes. 


These digital tools (i.e., embedded ruler, embedded protractor) are used for | 


measurements related to math items. They are available only with the specific 
items for which the Smarter Balanced Item Specifications indicate that one or 
more of these tools would be appropriate. 


responses. Spell check only gives an indication that a word is misspelled; it 
does not provide the correct spelling. This tool is available only with the 
specific items for which the Smarter Balanced Item Specifications indicated 
that it would be appropriate. Spell check is bundled with other embedded 
writing tools for all performance task full writes (planning, drafting, revising, 
and editing). A full write is the second part of a performance task. 


Allows users to cross out answer options. If an answer option is an image, a 


strikethrough line will not appear, but the image will be grayed out. 


| Selected writing tools (i.e., bold, italic, bullets, undo/redo) are available for all | 


student-generated responses. (Also see spell check.) 


Writing tool for checking the spelling of words in student-generated 


A tool for making text or other graphics in a window or frame appear larger on | 


the screen. The default font size for all tests is 14 pt. The student can make 
text and graphics larger by clicking the Zoom In button. The student can click 
the Zoom Out button to return to the default or smaller print size. When using 
the zoom feature, the student only changes the size of text and graphics on 
the current screen. To increase the default print size of the entire test (from 
1.5X to 3.0X default size), the print size must be set for the student in the Test 
Information Distribution Engine (TIDE, or state’s comparable platform), or set 
by the test administrator prior to the start of the test. This is the only feature 
that test administrators can set. The use of this universal tool may result in 
the student needing additional overall time to complete the assessment. 


Usability, Accessibility, and Accommodations Guidelines vi 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Non-embedded Universal Tools 


Some universal tools may need to be provided outside of the computer test administration system. 
These tools, shown in Table 2, are to be provided locally for those students. They can be made 
available to any student. 


Table 2. Non-embedded Universal Tools Available to All Students 
| | 


UT aVAY{=1¢st>1 i Reve) Description 


| Breaks Breaks may be given at predetermined intervals or after completion of 
sections of the assessment for students taking a paper-based test. 
Sometimes students are allowed to take breaks when individually needed to 
reduce cognitive fatigue when they experience heavy assessment demands. 
The use of this universal tool may result in the student needing additional 


overall time to complete the assessment. 


| 
English Dictionary 


performance task. A full write is the second part of a performance task. The 
use of this universal tool may result in the student needing additional overall 
time to complete the assessment. 


(for ELA-performance task 


| 
An English dictionary can be provided for the full write portion of an ELA 
full writes) 
| 


| Scratch paper Scratch paper to make notes, write computations, or record responses may 
be made available. Only plain paper or lined paper is appropriate for ELA. 
Graph paper is required beginning in sixth grade and can be used on all math 
assessments. A student can use an assistive technology device for scratch 


paper as long as the device is certified.* 


CAT: All scratch paper must be collected and securely destroyed at the end of 
each CAT assessment session to maintain test security. 


Performance Tasks: For mathematics and ELA performance tasks, if a 
student needs to take the performance task in more than one session, 
scratch paper may be collected at the end of each session, securely stored, 
and made available to the student at the next performance task testing 
session. Once the student completes the performance task, the scratch 
paper must be collected and securely destroyed to maintain test security. 


| 

Thesaurus A thesaurus contains synonyms of terms while a student interacts with text 
included in the assessment. A full write is the second part of a performance 
task. The use of this universal tool may result in the student needing 


additional overall time to complete the assessment. 


(for ELA-performance task 
full writes) 


Appendix A provides a summary of universal tools, designated supports, and accommodations (both 
embedded and non-embedded) available for the Smarter Balanced assessments. 


1 Smarter Balanced is working closely with our test administration platform vendor to create a process through which 
assistive technology devices can be certified. Certification ensures that the device functions properly and appropriately 
addresses test security. 


Usability, Accessibility, and Accommodations Guidelines 8 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Section Il: Smarter Balanced Designated Supports 


What Are Designated Supports? 


Designated supports for the Smarter Balanced assessments are those features that are available for 
use by any student for whom the need has been indicated by an educator (or team of educators with 
parent/guardian and student). Scores achieved by students using designated supports will be 
included for federal accountability purposes. It is recommended that a consistent process be used to 
determine these supports for individual students. All educators making these decisions should be 
trained on the process and should be made aware of the range of designated supports available. 
Smarter Balanced states have identified digitally-embedded and non-embedded designated supports 
for students for whom an adult or team has indicated a need for the support. 


Designated supports need to be identified prior to assessment administration. Embedded and non- 
embedded supports must be entered into the Test Information Distribution Engine (TIDE, or state’s 
comparable platform). Any non-embedded designated supports must be acquired prior to testing. 


Who Makes Decisions About Designated Supports? 


Informed adults make decisions about designated supports. Ideally, the decisions are made by all 
educators familiar with the student’s characteristics and needs, as well as those supports that the 
student has been using during instruction and for other assessments. Student input to the decision, 
particularly for older students, is also recommended. 


Forthcoming professional development materials to be available through Smarter Balanced will 
provide suggestions of processes that may be used if a district or school does not have an existing 
process in place for adults and others to make decisions about designated supports. The use of an 
Individual Student Assessment Accessibility Profile (ISAAP), created and provided by Smarter 
Balanced, is one process that may be used to determine which designated supports should be 
available for an individual student. Schools may choose to use another decision-making process. 
Regardless of the process used, all embedded designated supports must be activated prior to 
testing by entering information in the TIDE, or state’s comparable platform. 


Embedded Designated Supports 


Table 3 lists the embedded designated supports available to all students for whom the need has 
been indicated. It includes a description of each support along with recommendations for when the 
support might be needed. 


Table 3. Embedded Designated Supports 


Designated Support Description acsxex0) pal galsyalerola(e) alow ie) melsy>) 


Color contrast Enable students to adjust screen Students with attention difficulties may 
background or font color, based on need this support for viewing test content. 
student needs or preferences. This may It also may be needed by some students 


include reversing the colors for the entire with visual impairments or other print 
interface or choosing the color of font and | disabilities (including learning 
background. disabilities). Choice of colors should be 


Usability, Accessibility, and Accommodations Guidelines 9 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Description acsxex0) pal gatsyalersla(e) alow ie) melsy>) 


informed by evidence that color selections 
meet the student’s needs. 


Masking Masking involves blocking off content that | Students with attention difficulties may 
is not of immediate need or that may be need to mask content not of immediate 
distracting to the student. Students are need or that may be distracting during the 
able to focus their attention on a specific assessment. This support also may be 
part of a test item by masking. needed by students with print disabilities 
(including learning disabilities) or visual 
impairments. Masking allows students to 
hide and reveal individual answer options, 
as well as all navigational buttons and 
menus. 
| FT 
(for math stimuli embedded text-to-speech technology. The | need assistance accessing the 
items and ELA student is able to control the speed as assessment by having all or portions of 
items, not for well as raise or lower the volume of the the assessment read aloud. This support 
reading passages)2._| Voice via a volume control. also may be needed by students with 


reading-related disabilities, or by students 
who are blind and do not yet have 
adequate braille skills. This support will 
likely be confusing and may impede the 
performance of students who do not 
regularly have the support during 
instruction. Students who use text-to- 
speech will need headphones unless 
tested individually in a separate setting. 


(See Embedded 
Accommodations for 
ELA reading 
passages) 


| 
Translation of test directions is a Students who have limited English 


Translated test 

directions language support available prior to language skills can use the translated 

(for math items) beginning the actual test items. Students | directions support. This support should 
can see test directions in another only be used for students who are 
language. proficient readers in the other language 


and not proficient in English. 


Translations Translated glossaries are a language Students who have limited English 
(glossaries) support. The translated glossaries are language skills (whether or not 
(for math items) provided for selected construct-irrelevant | designated as ELLs or ELLs with 
terms for math. Translations for these disabilities) can use the translation 
terms appear on the computer screen glossary for specific items. The use of this 
when students click on them. Students Support may result in the student needing 
with the language glossary setting additional overall time to complete the 


enabled can view the translated glossary. | assessment. 
Students can also select the audio icon 
next to the glossary term and listen to the 


| 

| . . 

Text-to-speech Text is read aloud to the student via Students who are struggling readers may 
| 

| 

audio recording of the glossary. 


2 See Embedded Accommodations for guidelines on the use of Text-to-speech for ELA reading passages. 


Usability, Accessibility, and Accommodations Guidelines 10 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Description Recommendations for Use 
(stacked) (for math support. Stacked translations are not English and who use dual language 
items) available for some students; stacked supports in the classroom, use of the 
translations provide the full translation of | stacked (dual language) translation may 
each test item above the original item in be appropriate. Students participate in 
English. the assessment regardless of the 


language. This support will increase 
reading load and cognitive load. The use 
of this Support may result in the student 
needing additional overall time to 
complete the assessment. 


| | 
Translations Stacked translations are a language For students whose primary language is 
| 


Turn off any Disabling any universal tools that might Students who are easily distracted 


universal tools be distracting or that students do not (whether or not designated as having 
need to use, or are unable to use. attention difficulties or disabilities) may 


be overwhelmed by some of the universal 
tools. Knowing which specific tools may 
be distracting is important for determining 
which tools to turn off. 


Non-embedded Designated Supports 


Some designated supports may need to be provided outside of the digital-delivery system. These 
Supports, shown in Table 4, are to be provided locally for those students unable to use the 
designated supports when provided digitally. 


Table 4. Non-embedded Designated Supports 


Designated Support Description ac=xexe) pal gasyalerola(e) alow ie) melsy>) 


Bilingual dictionary A bilingual/dual language word-to-word For students whose primary language is 

(for ELA- dictionary is a language support. A not English and who use dual language 

performance task bilingual/dual language word-to-word supports in the classroom, use of a 

full writes) dictionary can be provided for the full bilingual/dual language word-to-word 
write portion of an ELA performance task. | dictionary may be appropriate. Students 
A full write is the second part of a participate in the assessment regardless 
performance task. of the language. The use of this support 


may result in the student needing 
additional overall time to complete the 


assessment. 
| | | | 
Color contrast Test content of online items may be Students with attention difficulties may 
printed with different colors. need this support for viewing the test 


when digitally-provided color contrasts do 
not meet their needs. Some students with 
visual impairments or other print 
disabilities (including learning disabilities) 
also may need this support. Choice of 
colors should be informed by evidence of 


Usability, Accessibility, and Accommodations Guidelines 11 


Smarter 
Balanced 


Assessment Consortium 


Description 


Usability, Accessibility, and Accommodations 


Guidelines 


acsxexe)aalaatsyarersidle) atswmne) mu Ulsy=) 


| 
Color overlays 


| 
Magnification 


(for math items and 
ELA items, not for 
reading passages) 


(See Non-embedded 
Accommodations for 
ELA reading 
passages) 


Scribe 


(for ELA non-writing 
items and math 


| 

Read aloud 
| 

items)? 


Color transparencies are placed over a 
paper-based assessment. 


The size of specific areas of the screen 
(e.g., text, formulas, tables, graphics, and 
navigation buttons) may be adjusted by 
the student with an assistive technology 
device. Magnification allows increasing 
the size to a level not provided for by the 
Zoom universal tool. 


Text is read aloud to the student by a 
trained and qualified human reader who 
follows the administration guidelines 
provided in the Smarter Balanced Test 
Administration Manual. All or portions of 
the content may be read aloud. 


Students dictate their responses to a 
human who records verbatim what they 
dictate. The scribe must be trained and 
qualified, and must follow the 
administration guidelines provided in the 


those colors that meet the student’s 
needs. 


Students with attention difficulties may 
need this support to view test content. 
This Support also may be needed by 
some students with visual impairments or 
other print disabilities (including learning 
disabilities). Choice of color should be 
informed by evidence of those colors that 
meet the student’s needs. 


Students used to viewing enlarged text or 
graphics, or navigation buttons may need 
magnification to comfortably view 
content. This support also may meet the 
needs of students with visual 
impairments and other print disabilities. 
The use of this designated support may 
result in the student needing additional 
overall time to complete the assessment. 


Students who are struggling readers may 
need assistance accessing the 
assessment by having all or portions of 
the assessment read aloud. This support 
also may be needed by students with 
reading-related disabilities, or by students 
who are blind and do not yet have 
adequate braille skills. If not used 
regularly during instruction, this support 
is likely to be confusing and may impede 
the performance on assessments. 
Readers should be provided to students 
on an individual basis - not to a group of 
students. A student should have the 
option of asking a reader to slow down or 
repeat text. The use of this support may 
result in the student needing additional 
overall time to complete the assessment. 


Students who have documented 
significant motor or processing 
difficulties, or who have had a recent 
injury (Such as a broken hand or arm) 
that make it difficult to produce 
responses may need to dictate their 


3 See Accommodations for use of Scribe for Writing items 


Usability, Accessibility, and Accommodations Guidelines 


12 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Description ac=xex0) pal galsyalersla(e) alow ie) melsy=) 


(See Smarter Balanced Test Administration responses to a human, who then records 
Accommodations for | Manual. the students’ responses verbatim. The 
Writing) use of this Support may result in the 
student needing additional overall time to 
complete the assessment. 
PO 
Separate setting Test location is altered so that the Students who are easily distracted (or 


student is tested in a setting different 
from that made available for most 
students. 


may distract others) in the presence of 
other students, for example, may need an 
alternate location to be able to take the 
assessment. The separate setting may be 
in a different room that allows them to 
work individually or among a smaller 
group, or in the same room but in a 
specific location (for example, away from 
windows, doors, or pencil sharpeners, in a 
study carrel, near the teacher’s desk, or 
in the front of a classroom). Some 
students may benefit from being in an 
environment that allows for movement, 
such as being able to walk around. In 
some instances, students may need to 
interact with instructional or test content 
outside of school, such as in a hospital or 
their home. A specific adult, trained in a 
manner consistent with the TAM, can act 
as test proctor (test administrator) when 
student requires it. 
ne 

Translated test PDF of directions translated in each of 

directions the languages currently supported. 

Bilingual adult can read to student. 


Students who have limited English 
language skills (whether or not 
designated as ELLs or ELLs with 
disabilities) can use the translated test 
directions. In addition, a biliterate adult 
trained in the test administration manual 
can read the test directions to the 
student. The use of this Support may 
result in the student needing additional 
overall time to complete the assessment. 


Translations Translated glossaries are a language Students who have limited English 
(glossaries) support. Translated glossaries are language skills can use the translation 
provided for selected construct-irrelevant | gsigssary for specific items. The use of this 
terms for math. Glossary terms are listed | support may result in the student needing 
by Item and Include the English term and | additional overall time to complete the 

its translated equivalent. assessment. 


(for math items) 


Appendix A provides a summary of universal tools, designated supports, and accommodations (both 
embedded and non-embedded) available for the Smarter Balanced assessments. 


Usability, Accessibility, and Accommodations Guidelines 13 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Section Ill: Smarter Balanced Accommodations 


What Are Accommodations? 


Accommodations are changes in procedures or materials that increase equitable access during the 
Smarter Balanced assessments. Assessment accommodations generate valid assessment results 
for students who need them; they allow these students to show what they know and can do. Smarter 
Balanced states have identified digitally-embedded and non-embedded accommodations for 
students for whom there is documentation of the need for the accommodations on an Individualized 
Education Program (IEP) or 504 accommodation plan. One exception to the IEP or 504 requirement 
is for students who have had a physical injury (e.g., broken hand or arm) that impairs their ability to 
use a computer. These students may use the speech-to-text or the scribe accommodations (if they 
have had sufficient experience with the use of these), as noted in this section. 


Determination of which accommodations an individual student will have available for the 
assessment is necessary because these accommodations must be made available before the 
assessment, either by entering information into the TIDE, or state’s comparable platform, for 
embedded accommodations, or by ensuring that the materials or setting are available for the 
assessment for non-embedded accommodations. 


The Smarter Balanced Test Administration and Student Access Workgroup recognized that 
accommodations could increase cognitive load or create other challenges for students who do not 
need them or who have not had experience using them. Because of this possibility, Smarter 
Balanced states agreed that a student’s parent/guardian should know about the availability of 
specific accommodations through a parent/guardian report. This would ensure that 
parents/guardians are aware of the conditions under which their child participated in the 
assessment. Information included in the parent/guardian report should not be the basis for any 
educational decisions (such as eligibility for an Advanced Placement class) nor for 
documenting/reporting the use of the accommodation elsewhere (Such as on a transcript). 


Who Makes Decisions About Accommodations? 


IEP teams and educators make decisions about accommodations. These teams (or educators for 
504 plans) provide evidence of the need for accommodations and ensure that they are noted on the 
IEP or 504 plan. 


The IEP team (or educator developing the 504 plan) is responsible for ensuring that information from 
the IEP is entered into the TIDE, or state’s comparable platform, so that all embedded 
accommodations can be activated prior to testing. This can be accomplished by identifying one 
person from the team to enter information into the TIDE, or state’s comparable platform, or by 
providing information to the test coordinator who enters into the TIDE, or state’s comparable 
platform, a form that lists all accommodations and designated supports needed by individual 
students on IEPs or 504 plans. 


Usability, Accessibility, and Accommodations Guidelines 14 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Embedded Accommodations 


Table 5 lists the embedded accommodations available for the Smarter Balanced assessments for 
those students for whom the accommodations are included on an IEP or 504 plan. The table 
includes a description of each accommodation along with recommendations for when the 
accommodation might be needed and how it can be used. For those accommodations that may be 
considered controversial, a description of considerations about the use of the accommodation is 
provided. 


Table 5. Embedded Accommodations 


yAVoxexe) pay aaveceraialeya 


Description at=xex0) pal patsyalersl(e) alow ie) melsy>) 


Language (ASL) video. ASL human signer and the hearing and who typically use ASL may 
signed test content are viewed onthe _| need this accommodation when accessing 
same screen. Students may view text-based content in the assessment. The 
portions of the ASL video as often as use of this accommodation may result in 
needed. the student needing additional overall 
time to complete the assessment. For 
many students who are deaf or hard of 
hearing, viewing signs is the only way to 
access information presented orally. It is 
important to note, however, that some 
students who are hard of hearing will be 
able to listen to information presented 
orally if provided with appropriate 
amplification and a setting in which 
extraneous sounds do not interfere with 
clear presentation of the audio 
presentation in a listening test. 


(for ELA Listening 
items and math 
items) 


a ( 
American Sign Test content is translated into ASL Some students who are deaf or hard of 
| apa SS a S95 SS SS —— = 
Braille A raised-dot code that individuals read | Students with visual impairments may 
with the fingertips. Graphic material read text via braille. Tactile overlays and 
(e.g., maps, charts, graphs, diagrams, | graphics also may be used to assist the 
and illustrations) is presented in a student in accessing content through 
raised format (paper or thermoform). touch. Refreshable braille is available only 
Contracted and non-contracted braille | for ELA because Nemeth Code is not 
is available; Nemeth code is available | available via refreshable braille. For math, 
for math. braille will be presented via embosser; 
embosser-created braille can be used for 
ELA also. The type of braille presented to 
the student (contracted or non- 
contracted) is set in TIDE, or state’s 
comparable platform. The use of this 
accommodation may result in the student 
needing additional overall time to 
complete the assessment. 
FO | 


| 
Closed captioning Printed text that appears on the Students who are deaf or hard of hearing 


computer screen as audio materials and who typically access information 
are presented. presented via audio by reading words that 
appear in synchrony with the audio 


(for ELA listening 
items) 


Usability, Accessibility, and Accommodations Guidelines 15 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Description acsxexe) pal palsyalersla(e) alow ie) melsy>) 


presentation may need this support to 
access audio content. For many students 
who are deaf or hard of hearing, viewing 
words (Sometimes in combination with 
reading lips and ASL) is how they access 
information presented orally. It is 
important to note, however, that some 
students who are hard of hearing will be 
able to listen to information presented 
orally if provided with appropriate 
amplification and a setting in which 
extraneous sounds do not interfere with 
clear presentation of the audio 
presentation in a listening test. 


embedded text-to-speech technology. | very small number of students (estimated 
(for ELA reading The student is able to control the to be approximately 1-2% of students with 
passages) speed as well as raise or lower the disabilities participating in a general 
volume of the voice via a volume assessment). 
control. 


e For students in grades 3 - 5, text-to- 
speech will not be an available 


accommodation. Content experts 
agree that this accommodation 
should not be provided during these 
grades because it would compromise 
the construct being measured. 


For students in grades 6 - 8 and 11, 
text-to-speech is available as an 


accommodation for students whose 
need is documented in an IEP or 504 
plan. 


Reports can be run to indicate the percent 
of students who had access to text-to- 
speech on reading test passages. 


Students who use text-to-speech will need 
headphones unless tested individually in 


| | 
| | . . . . . . 
Text-to-speech Text is read aloud to the student via This accommodation is appropriate for a 
a separate setting. 


Usability, Accessibility, and Accommodations Guidelines 16 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Non-embedded Accommodations 


Table 6 lists the non-embedded accommodations available for the Smarter Balanced assessments 
for those students for whom the accommodations are documented on an IEP or 504 plan. The table 
includes a description of each accommodation, along with recommendations for when the 
accommodation might be needed and how it can be used. For those accommodations that may be 
considered controversial, a description of considerations about the use of the accommodation is 
provided. 


Table 6. Non-embedded Accommodations Available 


A\exexe) palaateyer-iace) a Description accxexe)aalaats) avers 1a(e) atom ie) mOly= 


Abacus This tool may be used in place of 
scratch paper for students who 


typically use an abacus. 


who typically use an abacus may use an 
abacus in place of using scratch paper. 


| 
Alternate response 


fT 
Some students with visual impairments 
| 


Alternate response options include but | Students with some physical disabilities 


options are not limited to adapted keyboards, (including both fine motor and gross 
large keyboards, StickyKeys, motor skills) may need to use the 
Mousekeys, FilterKeys, adapted alternate response options 
mouse, touch screen, head wand, and _| accommodation. Some alternate 
switches. response options are external devices 
that must be plugged in and be 
compatible with the assessment delivery 
platform. 
| 
Calculator A non-embedded calculator for Students with visual impairments who are 


students needing a special calculator, unable to use the embedded calculator 

such as a braille calculator or a talking | for calculator-allowed items will be able to 

calculator, currently unavailable within | use the calculator that they typically use, 

the assessment platform. such as a braille calculator or a talking 
calculator. Test administrators should 
ensure that the calculator is available 
only for designated calculator items. 


(for calculator 
allowed items only) 


For students with a documented and 
persistent calculation disability (i.e., 
dyscalculia). 


Multiplication Table A paper-based single digit (1-9) 
multiplication table will be available 


(grade 4 and above from Smarter Balanced for reference. 


math items) 


Noise Buffers Ear mufflers, white noise, and/or other | Student (not groups of students) wears 

equipment used to block external equipment to reduce environmental 

sounds. noises. Students may have these testing 
variations if regularly used in the 
classroom. Students who use noise 
buffers will need headphones unless 


tested individually in a separate setting. 


Print on demand Paper copies of either Some students with disabilities may need 
passages/stimuli and/or items are paper copies of either passages/stimuli 


Usability, Accessibility, and Accommodations Guidelines 17 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Description accxexe)aalaats) avers 1a(e) alow ce) mUly= 
printed for students. For those and/or items. A very small percentage of 
students needing a paper copy of a students should need this 
passage or stimulus, permission for accommodation. The use of this 
the students to request printing must accommodation may result in the student 
first be set in TIDE, or state’s needing additional time to complete the 
comparable platform. For those assessment. 


students needing a paper copy of one 
or more items, the Smarter Balanced 
Help Desk (1-855-833-1969) must be 
contacted by the school or district 
coordinator to have the 
accommodation set for the student. 


trained and qualified human reader very small number of students (estimated 
(for ELA reading who follows the administration to be approximately 1-2% of students with 
passages, grades 6-8 | suidelines provided in the Smarter disabilities participating in a general 


and 11; blind 
students in grades 3- 
8 and 11 who do not 
yet have adequate 
braille skills) 


Balanced Test Administration Manual. | assessment). 

All or portions of the content may be 

read aloud. e For students in grades 3 - 5, read 
aloud will not be an available 
accommodation. Content experts 
agree that this accommodation 
should not be provided during these 
grades because it would compromise 
the construct being measured. 


For students in grades 6 - 8 and 11, 
read aloud is available as an 


accommodation for students whose 
need is documented in an IEP or 504 
plan. 


Reports can be run to indicate the 
percent of students who had access to 
read aloud on reading test passages. 


Readers should be provided to students 
on an individual basis - not to a group of 
students. A student should have the 
option of asking a reader to slow down or 
repeat text. 


Scribe Students dictate their responses to a Students who have documented 

human who records verbatim what significant motor or processing 

they dictate. The scribe must be difficulties, or who have had a recent 

trained and qualified, and must follow injury (Such as a broken hand or arm) that 

the administration guidelines provided | makes it difficult to produce responses 

in the Smarter Balanced Test may need to dictate their responses to a 

Administration Manual. human, who then records the students’ 
responses verbatim. The use of this 
accommodation may result in the student 


(See Designated 
Supports for math 


| 

Read aloud Text is read aloud to the student by a This accommodation is appropriate for a 
| 

and non-writing ELA) 


Usability, Accessibility, and Accommodations Guidelines 18 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Description acsxexe) pal aalsyalersla(e) alow ie) melsy>) 


needing overall additional time to 
complete the assessment. For many of 
these students, dictating to a human 
scribe is the only way to demonstrate 
their composition skills. It is important 
that these students be able to develop 
planning notes via the human scribe, and 
to view what they produce while 
composing via dictation to the scribe. 


use their voices as input devices to the | disabilities (Such as dyslexia) or who have 
computer, to dictate responses or give | had a recent injury (Such as a broken 
commands (e.g., opening application hand or arm) that make it difficult to 
programs, pulling down menus, and produce text or commands using 
saving work). Voice recognition computer keys may need alternative ways 
software generally can recognize to work with computers. Students will 
speech up to 160 words per minute. need to be familiar with the software, and 
Students may use their own assistive have had many opportunities to use it 
technology devices. prior to testing. Soeech-to-text software 
requires that the student go back through 
all generated text to correct errors in 
transcription, including use of writing 
conventions; thus, prior experience with 
this accommodation is essential. If 
students use their own assistive 
technology devices, all assessment 
content should be deleted from these 
devices after the test for security 
purposes. For many of these students, 
using voice recognition software is the 
only way to demonstrate their 
composition skills. Still, use of soeech-to- 
text does require that students know 
writing conventions and that they have 
the review and editing skills required of 
students who enter text via the computer 
keyboard. It is important that students 
who use speech-to-text also be able to 
develop planning notes via speech-to-text, 
and to view what they produce while 


| | 

| . =.8 | 
Speech-to-text Voice recognition allows students to Students who have motor or processing 
composing via speech-to-text. 


Usability, Accessibility, and Accommodations Guidelines 19 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Resources 


Christensen, L., Carver, W., VanDeZande, J., & Lazarus, S. (2011). Accommodations manual: How to 
select, administer, and evaluate the use of accommodations for instruction and assessment of 
students with disabilities (3 ed.). Washington, DC: Assessing Special Education Students State 
Collaborative on Assessment and Student Standards, Council of Chief State School Officers. 


Christensen, L., Shyyan, V., Schuster, T., Mahaley, P., & Saez, S. (2012). Accommodations manua!: 
How to select, administer, and evaluate use of accommodations for instruction and assessment of 
English language learners. Minneapolis, MN: University of Minnesota, National Center on 
Educational Outcomes. 


Fedorchak, G. (2012). Access by Design - Implications for equity and excellence in education. Draft 
paper prepared for the Smarter Balanced Assessment Consortium. 


Measured Progress. (2013). Framework for Accessibility and Accommodations. Smarter Balanced 
Assessment Consortium. (Forthcoming Spring 2014) 


National Center on Educational Outcomes. (2009). Accommodations bibliography. Minneapolis, MN: 
University of Minnesota, National Center on Educational Outcomes. Available at: 
https://apps.cehd.umn.edu/nceo/accommodations/ 


National Council on Measurement in Education. (2012). Testing and data integrity in the 
administration of statewide student assessment programs. 


Professional Development Module. (Forthcoming Spring 2014) 


Shyyan, V., Christensen, L., Touchette, B., Lightborne, L., Gholson, M., & Burton, K. (2013). 
Accommodations manual: How to select, administer, and evaluate use of accommodations for 
instruction and assessment of English language learners with disabilities. Minneapolis, MN: 
University of Minnesota, National Center on Educational Outcomes. 


Smarter Balanced. (2012). Translation accommodations framework for testing ELLs in mathematics. 
Available at: http://www.smarterbalanced.org/wordpress/wp- 
content/uploads/2012/09/Translation-Accommodations-Framework-for-Testing-ELL-Math. pdf 


Smarter Balanced. (2012). Accommodations for English Language Learners and Students with 
Disabilities: A research-based decision algorithm. Available at: 
http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/08/Accomodations-for- 
under-represented-students. pdf 


Usability, Accessibility, and Accommodations Guidelines 20 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Appendix A: Summary of Smarter Balanced Universal Tools, 
Designated Supports, and Accommodations 


Universal Tools Designated Supports Accommodations 

Embedded | Breaks Color Contrast American Sign Language?° 

Calculator+ Masking Braille 

Digital Notepad Text-to-Speech® Closed Captioning11 

English Dictionary? Translated Test Directions’ Text-to-Speech12 

English Glossary Translations (Glossary)® 

Expandable Passages Translations (Stacked)? 

Global Notes Turn off Any Universal Tools 

Highlighter 


Keyboard Navigation 
Mark for Review 


Math Tools? 

Spell Check4 

Strikethrough 

Writing Tools® 

Zoom 

Non-embedded | Breaks Bilingual Dictionary? Abacus 

English Dictionary1% Color Contrast Alternate Response Options18 

Scratch Paper Color Overlay Calculator? 

Thesaurus14 Magnification Multiplication Table2° 
Read Aloud Noise Buffers 
Scribet6 Print on Demand 
Separate Setting Read Aloud 
Translated Test Directions Scribe 
Translations (Glossary)*/ Speech-to-Text 


*ltems shown are available for ELA and Math unless otherwise noted. 


1 For calculator-allowed items only 

2 For ELA performance task full-writes 

3 Includes embedded ruler, embedded protractor 

4 For ELA items 

° Includes bold, italic, underline, indent, cut, paste, spell check, bullets, undo/redo. 
6 For ELA items (not ELA reading passages) and math items 
‘ For math items 

8 For math items 

2 For math test 

10 For ELA listening Items and math items 

11 For ELA listening items 


’* For ELA reading passages grades 6-8 and 11 
13 For ELA performance task full-writes 


14 For ELA performance task full-writes 
15 For ELA performance task full-writes 


16 For ELA non-writing items and math items 
‘7 For math items 


18 Includes adapted keyboards, large keyboards, StickyKeys, MouseKeys, FilterKeys, adapted mouse, touch screen, head 
wand, and switches. 


‘9 For calculator-allowed items only 
20 For math items beginning in grade 4. 


Usability, Accessibility, and Accommodations Guidelines 21 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Appendix B: Research-based Lessons Learned about Universal 
Design, Accessibility Tools, and Accommodations 


More than half of all states in the United States participated in research spurred by the opportunity 
that states had to develop alternate assessments based on modified achievement standards (AA- 
MAS). The research conducted since 2007 provides numerous findings that are relevant to the next 
generation assessments. Lessons learned from this research that are relevant to the Smarter 
Balanced assessment system are highlighted here? 


Who might benefit from accessibility features identified by AA-MAS research? 


Several studies explored the characteristics of students who might benefit from an AA-MAS and the 
accessibility features incorporated in the assessment. These studies consistently found: 

e Students with and without Individualized Education Programs (IEPs) and 504 plans would 
likely benefit from assessments with increased accessibility features. 

e Students identified for the AA-MAS or who were among the lowest performing students in a 
state tended to be males, ethnic or racial minorities, English language learners, or from low 
socioeconomic backgrounds. 

e Students identified for the AA-MAS tended to have difficulty with: 

— Print materials 
— High vocabulary load materials 
— Directions 
— Multi-step problem solving 
e Students identified for the AA-MAS tended to have: 
— Distractibility 
— Limited meta-cognitive skills 
— Poor organizational skills 
— Poor self-monitoring skills 
— Slower work pace 
— Limited working memory capacity 


What changes can be made to test items and tests that do not change the 
construct being assessed? 


Many studies examined the effects of changes to test items or the tests themselves. Among those 
changes that did not violate the construct were: 
e Enhanced directions 
Increased size of text and visuals 
Increased white space 
Simplified formats, including simplified visuals 
Underlining 


21 The research used to develop this summary was highlighted in the document Lessons Learned in Federally Funded 
Projects That Can Improve the Instruction and Assessment of Low Performing Students with Disabilities, edited by M. 
Thurlow, S. Lazarus, and S. Bechard (2012), available at www.nceo.info/OnlinePubs/LessonsLearned.pdf, and 
presentations by the authors of three of the chapters in the Lessons Learned report, Sue Bechard, Vince Dean, Shery!| 
Lazarus, and Shelly Loving-Ryder, along with representatives from the two general assessment consortia (PARCC - Tamara 
Reavis; Smarter Balanced - Magda Chia). 


Usability, Accessibility, and Accommodations Guidelines 22 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Among those changes that might not violate the construct, depending on how the construct was 
specifically defined, were: 


Adding visuals 

Bolding text 

Simplifying language in item stems 

Changing distractors by editing the attractive distractor or changing the order of distractors 
Chunking text by embedding questions within a passage 

Reordering items 

Providing thought questions or hint boxes 

Scaffolding for vocabulary, definition, context, inference, or complex questions 


Other findings highlighted the need for individualized decisions about some accessibility features. 
For example: 


e Read-aloud features are differentially effective for and preferred by students 
e Some features increase engagement and motivation in students 
e Too many features can be confusing to students 


Researchers found that students needed to have the opportunity to practice new item types and new 
accessibility features. In addition, their research emphasized the benefits of cognitive labs and item 
tryouts with students. 


What can test developers do to build on the lessons learned from AA-MAS 
research and implementation? 


Many studies and AA-MAS implementation efforts pointed to considerations for test developers. For 
example: 
e Require item-writer training that focuses on universal design and accessibility principles 
e Develop items from scratch rather than attempting to modify existing ttems to increase 
universal design and accessibility characteristics 
e Ensure that all users understand the purpose of the assessment through professional 
development activities 
e Always consider format changes that might increase the accessibility of items and tests, but 
make changes to content and cognitive load only after careful delineation of the purpose and 
content targets of the assessment. 
e Engage in research on the effects of individual changes and combinations of changes 
intended to increase universal design and accessibility. 
e Implement innovative items with caution, and only after exploring the accessibility 
implications of the innovative items. 


Usability, Accessibility, and Accommodations Guidelines 23 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Appendix C: Frequently Asked Questions 


Smarter Balanced states identified frequently asked questions (FAQs) and developed applicable 
responses to support the information provided in the Smarter Balanced Assessment Consortium’s 
Usability, Accessibility, and Accommodations Guidelines. These questions and responses, as well as 
the information in the Guidelines document apply to the Smarter Balanced interim and summative 
assessments. 


States may use these FAQs to assist districts and schools with transitioning from their former 
assessments to the Smarter Balanced assessments. In addition, the FAQs may be used by districts 
to ensure understanding among staff and schools regarding the universal tools, designated 
Supports, and accommodations available for the Smarter Balanced assessments. Schools may use 
them with decision-making teams (including parents) as decisions are made and implemented with 
respect to use of the Smarter Balanced Usability, Accessibility, and Accommodations Guidelines. 


Additional information to aid in the implementation of the Guidelines is available in the Individual 
Student Assessment Accessibility Profile (ISAAP) Module, the Test Administration Manual, and the 
Implementation Guide. These documents will be made available over the next few weeks. 


The FAQs are organized into four sections. First are general questions. Second is a set of questions 
about specific universal tools and designated supports. Questions that pertain specifically to English 
language learners (ELLS) comprise the third set of FAQs, and questions that pertain specifically to 
students with disabilities comprise the fourth set of FAQs. 


General FAQs 


1. What are the differences among the three categories of universal tools, designated supports, 
and accommodations? 


Universal tools are access features that are available to all students based on student 
preference and selection. Designated supports for the Smarter Balanced assessments are 
those features that are available for use by any student (including English language learners, 
students with disabilities, and English language learners with disabilities) for whom the need 
has been indicated by an educator or team of educators (with parent/guardian and student 
input as appropriate). Accommodations are changes in procedures or materials that increase 
equitable access during the Smarter Balanced assessments by generating valid assessment 
results for students who need them and allowing these students the opportunity to show 
what they know and can do. The Usability, Accessibility, and Accommodations Guidelines 
identify accommodations for students for whom there is documentation of the need for the 
accommodations on an Individualized Education Program (IEP) or 504 accommodation plan. 


Universal tools, designated supports, and accommodations may be either embedded in the 
test administration system or provided locally (non-embedded). 


Usability, Accessibility, and Accommodations Guidelines 24 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


2. Which students should use each category of universal tools, designated supports, and 
accommodations? 


Universal tools are available to all students, including those receiving designated supports 
and those receiving accommodations. Designated supports are available only to students for 
whom an adult or team (consistent with state-designated practices) has indicated the need 
for these supports (as well as those students for whom the need is documented). 
Accommodations are available only to those students with documentation of the need 
through either an Individualized Education Program (IEP) or a 504 accommodation plan. 
Students who have IEPs or 504 accommodation plans also may use designated supports 
and universal tools. 


What Tools Are Available for my Student? 


English language Students with ELLs with 
learners (ELLs) (ol tyele) {idles disabilities 


AN syaucetslayes 


1 Only for instances that an adult (or team) has deemed the supports appropriate for a specific student’s testing needs. 


3. What is the difference between embedded and non-embedded approaches? How might 
educators decide what is most appropriate? 


Embedded versions of the universal tools, designated supports, and accommodations are 
provided digitally through the test delivery system while non-embedded versions are provided 
at the local level through means other than the test delivery system. The choice between 
embedded and non-embedded universal tools and designated supports should be based on 
the individual student’s needs. The decision should reflect the student’s prior use of, and 
experience with, both embedded and non-embedded universal tools, designated supports, 
and accommodations. It is important to note that although Print on Demand is a non- 
embedded accommodation, permission for students to request printing must first be set in 
Test Information Distribution Engine (TIDE) or the state’s comparable platform 


4. Who determines how non-embedded accommodations (such as read aloud) are provided? 


IEP teams and educators make decisions about non-embedded accommodations. These 
teams (or educators for 504 plans) provide evidence of the need for accommodations and 
ensure that they are noted on the IEP or 504 plan (See Guidelines, pages 15-17). States are 
responsible for ensuring that districts and schools follow Smarter Balanced guidance on the 
implementation of these accommodations (see [professional development materials]). 


5. Are any students eligible to use text-to-speech for ELA reading passages on the Smarter 
Balanced assessments? 


For students in grades 3-5, text-to-speech and read-aloud are not available on ELA reading 
passages. The use of text-to-speech (or read aloud) on ELA reading passages for grades 3-5 
will result in invalid scores. In grades 6-8 and 11, text-to-speech and read-aloud are available 


Usability, Accessibility, and Accommodations Guidelines 25 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


for ELA reading passages as an accommodation for students whose need is documented on 
an IEP or 504 plan (see Guidelines, pages 10 and 15), subject to each member state's laws, 
regulations, and policies. Text-to-speech and read-aloud for ELA reading passages is not 
available for ELLs (unless the student has an IEP or 504 plan). Whenever text-to-speech is 
used, appropriate headphones must be available to the student, unless the student is tested 
individually in a separate setting. 


6. Why are some accommodations that were previously allowed for my state assessment not listed 
in the Smarter Balanced Usability, Accessibility, and Accommodations Guidelines? 


After examining the latest research and conducting numerous discussions with external and 
state experts, Smarter Balanced member states approved a list of universal tools, 
designated supports, and accommodations applicable to the current design and constructs 
being measured by its tests and items within them. Upon review of new research findings or 
other evidence applicable to accessibility and accommodations considerations, the list of 
Specific universal tools, designated supports, and accommodations approved by Smarter 
Balanced may be subject to change. The Consortium will establish a standing committee, 
including members from Governing States, to review suggested adjustments to the list of 
universal tools, designated supports, and accommodations to determine whether changes 
are warranted. 


Proposed changes to the list of universal tools, designated supports, and accommodations 
will be brought to Governing States for review, feedback, and approval. Furthermore, states 
may issue temporary approvals (i.e., one Summative assessment administration) for unique 
accommodations for individual students. 


State leads will evaluate formal requests for unique accommodations and determine 
whether the request poses a threat to the measurement of the construct. The formal 
requests will include documentation of the student need, the specific nature of the universal 
tools, designated supports, or accommodations, and the plan for follow-up monitoring of use. 
Upon issuing a temporary approval, the State will send documentation of the approval to the 
Consortium. The Consortium will consider all state-approved temporary accommodations as 
part of the Consortium’s accommodations review process. The Consortium will provide to 
member states a list of the temporary accommodations issued by states that are not 
Consortium-approved accommodations. In subsequent years, states will not be able to offer 
as a temporary accommodation any temporary accommodation that has been rejected by 
the Consortium. 


7. Under which conditions may a state elect not to make available to its students an 
accommodation that is allowed by Smarter Balanced? 


The Consortium recognizes that there should be a careful balance between the need for 
uniformity among member states and the need for states to maintain their autonomy. To 
maintain this balance, individual states may elect not to make available an accommodation 
that is in conflict with the member state's laws, regulations, or policies. 


8. Can states allow additional universal tools, designated supports, or accommodations to 
individual students on a case by case basis? 


Yes, only in certain restricted and emergent circumstances. To address emergent issues that 
arise at the local level, authorized staff in member states will have the authority to approve 
temporary unique testing conditions for individual students. Because it is unknown whether a 
temporarily provided universal tool, designated support or accommodation actually belongs 


Usability, Accessibility, and Accommodations Guidelines 26 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


in the defined categories, all such temporary testing conditions are considered to be unique 
accommodations. Authorized state staff includes only those individuals who are familiar with 
the constructs the Smarter Balanced assessments are measuring, so that students are not 
inadvertently provided with universal tools, designated Supports, or accommodations that 
violate the constructs being measured. 


The unique accommodations approved by a state for individual students will be submitted to 
Smarter Balanced for review. Temporary unique accommodations accepted by Smarter 
Balanced will be incorporated into the official guidelines released by Smarter Balanced in the 
following year. Authorized state staff members are not to add any universal tools, designated 
Supports, or accommodations to the Smarter Balanced Guidelines; only the Smarter 
Balanced Consortium may do so. 


9. What is to be done for special cases of “sudden” physical disability? 


One exception to the IEP or 504 requirement is for students who have had a physical injury 
(e.g., broken hand or arm) that impairs their ability to use a computer. For these situations, 
students may use the speech-to-text or scribe accommodations (if deemed appropriate 
based on the student having had sufficient experience with the use of the accommodations) 
(see Guidelines, page 13). 


10. Who reviewed the Smarter Balanced Guidelines? 


In addition to individuals and officials from the Smarter Balanced governing states, several 
organizations and their individual members provided written feedback on the guidelines: 


American Federation of Teachers 

California School for the Blind 

California School for the Deaf 

Californians Together 

California State Teach 

Center for Applied Special Technology 

Center for Law and Education 

Conference of Educational Administrators of Schools and Programs for the Deaf 
Council for Exceptional Children 

Council of the Great City Schools 

Council of Parent Attorneys and Advocates 

Learning Disabilities Association of Maryland 

Mexican American Legal Defense and Education Fund 
Missouri School Boards’ Association 

Missouri Council of Administrators of Special Education 
National Center for Learning Disabilities 

The Advocacy Institute 

The National Hispanic University 


11. Where can a person go to get more information about making decisions on the use of 
designated supports and accommodations? 


Practice tests provide students with experiences that are critical for Success in navigating the 
platform easily. The practice tests may be particularly important for those students who will 
be using designated supports or accommodations, because the practice tests can provide 
data that may be useful in determining whether a student might benefit from the use of a 


Usability, Accessibility, and Accommodations Guidelines 2/ 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


particular designated support or accommodation. Smarter Balanced practice tests are 
available at http://www.smarterbalanced.org/pilot-test/. 

In addition, it is recommended that decision makers refer to professional develooment 
materials provided by Smarter Balanced or state offices on the Individual Student 
Assessment Accessibility Profile (ISAAP) or state-developed process, as well as other state- 
developed materials consistent with the Smarter Balanced Implementation Guide. 


Additional information on the decision-making process, and ways to promote a thoughtful 
process rather than an automatic reliance on a checklist or menu, is available through 
materials developed by groups of states.?2 


12. What security measures need to be taken before, during, and after the assessment for students 
who use universal tools, designated supports, or accommodations? 


Test security involves maintaining the confidentiality of test questions and answers, and is 
critical in ensuring the integrity of a test and validity of test results. Ensuring that only 
authorized personnel have access to the test and that test materials are kept confidential is 
critical in technology-based assessments. In addition, it is important to guarantee that (a) 
students are seated in such a manner that they cannot see each other’s terminals, (b) 
students are not able to access any unauthorized programs or the Internet while they are 
taking the assessment, and (c) students are not able to access any externally-saved data or 
computer shortcuts while taking the test. Prior to testing, the IEP team should check on 
compatibility of assistive technology devices and make appropriate adjustments if necessary. 
When a non-embedded designated support or accommodation is used that involves a human 
having access to items (e.g., reader, scribe), procedures must be in place to ensure that the 
individual understands and has agreed to security and confidentiality requirements. Test 
administrators need to (a) keep testing materials in a secure place to prevent unauthorized 
access, and (b) keep all test content confidential and refrain from sharing information or 
revealing test content. 


Printed test items/stimuli, including embossed Braille printouts, must be collected and 
inventoried at the end of each test session and securely shredded immediately. DO NOT 
keep printed test items/stimuli for future test sessions. 


The following test materials must be securely shredded immediately after each testing 
session and may not be retained from one testing session to the next: 


e Scratch paper and all other paper handouts written on by students during testing; 


22 These materials were developed by collaboratives of states to address decision making for students with disabilities, 
ELLs, and ELLs with disabilities: 


e Accommodations Manual: How to Select, Administer, and Evaluate Use of Accommodations for Instruction and 
Assessment of Students with Disabilities (3 ed.). Washington, DC: Assessing Special Education Students State 
Collaborative on Assessment and Student Standards, Council of Chief State School Officers. Available at: 
www.ccsso.org/Resources/Programs/Assessing Special Education Students (ASES).html. 

e Accommodations Manual: How to Select, Administer, and Evaluate Use of Accommodations for Instruction and 
Assessment of English Language Learners. Washington, DC: Washington, DC: Assessing English Language Learners 
State Collaborative on Assessment and Student Standards, Council of Chief State School Officers. Available at: 
www.ccsso.org?Resources?Programs?English_ Language Learners (ELL).html. 

e Accommodations Manual: How to Select, Administer, and Evaluate Use of Accommodations for Instruction and 
Assessment of English Language Learners with Disabilities. Washington, DC: Assessing Special Education Students 
and English Language Learners State Collaboratives on Assessment and Student Standards, Council of Chief State 
School Officers. Available at 
www.ccsso.org/Resources/Publications/Accommodations Manual How to Select Administer and Evaluate Use of 

Accommodations for Instruction and Assessment of English Language Learners with Disabilities.html. 


Usability, Accessibility, and Accommodations Guidelines 28 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


— Please note, for mathematics and ELA performance tasks, if a student needs 
to take the performance task in more than one session, scratch paper may 
be collected at the end of each session, securely stored, and made available 
to the student at the next performance task testing session. Once the student 
completes the performance task, the scratch paper must be collected and 
securely destroyed to maintain test security. 

e Any reports or other documents that contain personally identifiable student 
information; 
e Printed test items or stimulli. 


Additional information on this topic is provided in the Test Administration Manual (TAM). 


13. Who is supposed to input information about designated supports and accommodations into the 
Test Information Distribution Engine (TIDE) or into a state’s comparable platform? How is the 
information verified? 


Generally a school or district will designate a person to enter information into the TIDE or the 
state’s comparable platform. Often this person is a test coordinator. For those students for 
whom an IEP team (or educator developing the 504 plan) is identifying designated supports 
as well as accommodations, that team or educator is responsible for ensuring that 
information from the IEP (or 504 plan) is entered appropriately so that all embedded 
accommodations can be activated prior to testing. 


Entry of information for IEP and 504 students can be accomplished by identifying one person 
from the team to enter information or by providing information to the person designated by 
the school or district to enter data into the TIDE. For students who are ELLs, an educator who 
knows the student well and is familiar with the instructional Supports used in the classroom 
should provide information to the person designated to enter information into the TIDE. 


14. Are there any supplies that schools need to provide so that universal tools, designated supports, 
and accommodations can be appropriately implemented? 


Schools should determine the number of headphones they will provide (for text-to-speech, as 
well as for the listening test) and other non-embedded universal tools (e.g., thesaurus), 
designated supports (e.g., bilingual dictionary), and accommodations (e.g., multiplication 
table) for students. An alternative is to identify these as items that students will provide on 
their own. 


15. What happens when accommodations listed in the Usability, Accessibility, and Accommodations 
Guidelines do not match any accommodations presented in the student’s IEP? 


IEP teams should consider accommodations a student needs in light of the Smarter 
Balanced Guidelines. If it is decided that a specific accommodation is needed that is not 
included in the Guidelines, the team should submit a request to the state. The state contact 
will judge whether the proposed accommodation poses a threat to the constructs measured 
by the Smarter Balanced assessments; based on that judgment the state contact will either 
issue a temporary approval or will deny the request. Temporary approvals will be forwarded 
to a standing committee; this committee makes a recommendation to the Governing States 
about future incorporation of new accommodations into the Smarter Balanced Guidelines. 


Usability, Accessibility, and Accommodations Guidelines 29 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Universal Tools and Designated Supports FAQs (Available to All Students) 


16. Is the digital notepad universal tool fully available for ELA and Math? Will a student’s notes be 
saved if the student takes a 20-minute break? 


The digital notepad is available on all items across both content areas. As long as a student 
or test administrator activates the test within the 20-minute break window, the notes will still 
be there. There is no limit on the number of pauses that a student can take in one test 
sitting. 


1/. For the global notes universal tool, if a student takes a break of 20 minutes do the notes 
disappear? 


Global notes, which are used for ELA performance tasks only, will always be available until 
the student submits the test, regardless of how long a break lasts or how many breaks are 
taken. 


18. For the highlighter universal tool, if a student pauses a test for 20-minutes, do the highlighter 
marks disappear? 


lf a student is working On a passage or stimulus on a screen and pauses the test for 20 
minutes to take a break, the student will still have access to the information visible on that 
particular screen. However, students do lose access to any information highlighted on a 
previous screen. 


19. How are students made aware that the spell check universal tool (for ELA) and the math 
universal tools (i.e., calculator) are available when moving from item to item? 


When appropriate, math items include universal tools available for students to use. For the 
spell check tool, a line will appear under misspelled words. 


20. For the zoom universal tool, is the default size specific to certain devices? Will the test 
administrator’s manual provide directions on how to do this adjustment? 


The default size is available to all students and is not specific to certain devices. Information 
on how to use the zoom universal tool is included in the directions at the beginning of each 
test. Please note that in addition to zoom, students may have access to magnification, which 
is a non-embedded designated support. 


21. For the English glossary universal tool, how are terms with grade- and context-appropriate 
definitions made evident to the student? 


Selected terms have a light rectangle around them. If a student hovers over the terms, the 
terms with the attached glossary are highlighted. A student can click on the terms and a pop- 
up window will appear. In addition, a student can click on the audio button next to each term 
to hear it. 


22. For the mark-for-review universal tool, will selections remain visible after a 20-minute break? 


lf a student takes a break for longer than 20 minutes, the student will not be able to access 
items from previous screens. 


Usability, Accessibility, and Accommodations Guidelines 30 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


23. Can universal tools be turned off if it is determined that they will interfere with the student’s 
performance on the assessment? 


Yes. If an adult (or team) determines that a universal tool might be distracting or that 
students do not need to or are unable to use them. This information must be noted in TIDE 
prior to test administration. 


FAQs Pertaining to English Language Learners (ELLs) 


24, How are the language access needs of ELLs addressed in the Smarter Balanced Usability, 
Accessibility, and Accommodations Guidelines? 


The language access needs of ELLs are addressed through the provision of numerous 
universal tools and designated supports. These include universal tools such as English 
dictionaries for full writes and English glossaries, and designated supports such as 
translated test directions and glossaries. These are not considered accommodations in the 
Smarter Balanced assessment system. No accommodations are available for ELLs on the 
Smarter Balanced assessments; accommodations are only available to students with 
disabilities and ELLs with disabilities. 


25. Is text-to-speech available for ELLs to use? 


Text-to-speech is available as a designated support to all students (including ELLs) for whom 
an adult or team has indicated it is needed for math items and for ELA items (but not ELA 
reading passages). Text-to-speech for ELA reading passages is available for an ELL in grades 
6-8 or 11 only if the student has an IEP or 504 plan. For text-to-speech to be available for an 
ELL, it must be entered into the TIDE. 


26. What languages are available to ELLs in text-to-speech? 


Text-to-speech is currently available only in English. However, the translated glossaries 
include an audio component automatically available to any student with the translated 
glossaries embedded designated support. 


2/.For which content areas will the Consortium provide translation supports for students whose 
primary language is not English? 


For Mathematics, the Consortium will provide full translations in American Sign Language, 
stacked translations in Spanish (with the Spanish translation presented directly above the 
English item), and primary language pop-up glossaries in various languages and dialects 
including Spanish, Vietnamese, Arabic, Tagalog, Ilokano, Cantonese, Mandarin, Korean, 
Punjabi, Russian, and Ukrainian. For the Listening portion of the English Language Arts 
assessment, Smarter Balanced will provide full translations in American Sign Language 
delivered digitally through the test delivery system. 


Only translations that have gone through the translation process outlined in the Smarter 
Balanced Translation framework would be an accepted support 
(http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/09/Translation- 


Accommodations-Framework-for-Testing-ELL-Math.pdf). 


Usability, Accessibility, and Accommodations Guidelines al 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


28. Does a student need to be identified as an English language learner in order to receive 
translation and language supports? What about foreign language exchange students? 


Translations and language supports are provided as universal tools and designated supports. 
Universal tools are available to all students. Designated supports are available to those 
students for whom an adult (or team) has determined a need for the support. Thus, these are 
available to all students, regardless of their status as an ELL. Foreign language exchange 
students would have access to all universal tools and those designated supports that have 
been indicated by an adult (or team). 


29. For the translated test directions designated support, what options are available for students 
who do not understand the language available in the digital format? Can a human reader of 
directions in the native language be provided? 


lf a student needs a read aloud/text-to-speech accommodation in another language, then 
the test directions should be provided in that other language. The reader or text-to-speech 
device must be able to provide the directions in the student’s language without difficulty due 
to accent or register. To ensure quality and standardized directions, the reader or text-to- 
Speech device should only use directions that have undergone professional translation by 
the Consortium prior to testing. Smarter Balanced is providing a PDF of the translated test 
directions in each of the languages supported by the translated glossary designated support: 
Spanish, Vietnamese, Arabic, Tagalog, llokano, Cantonese, Mandarin, Korean, Punjabi, 
Russian, and Ukrainian. 


30. How is the translations glossary non-embedded designated support different from the bilingual 
dictionary? 


The translations glossary non-embedded designated support includes the customized 
translation of pre-determined construct-irrelevant terms that are most challenging to English 
language learners. The translation of the terms is context-specific and grade-appropriate. 
Bilingual dictionaries often do not provide context-specific information nor are they 
customized. In addition, the translated glossary includes an audio support. 


31. Will translations be available in language dialects/variants? 


Translated glossaries will be available in different languages and dialects including Spanish, 
Vietnamese, Arabic, Tagalog, Ilokano, Cantonese, Mandarin, Korean, Punjabi, Russian, and 
Ukrainian. 


FAQs Pertaining to Students with Disabilities 


32. What accommodations are available for students with disabilities (including ELLs with 
disabilities)? 


Students with disabilities (including those who are ELLs) can use embedded 
accommodations (e.g., American Sign Language, braille, soeech-to-text) and non-embedded 
accommodations (e.g., abacus, alternate response options) that have been documented on 
an IEP or 504 accommodations plan. These students also may use universal tools and 
designated supports. A full list of accommodations can be found in the Guidelines 
documents, tables 5 and 6. 


Usability, Accessibility, and Accommodations Guidelines 32 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


33. Ils an embedded ASL accommodation available on ELA items that are not part of the Listening 
test? 


The embedded ASL accommodation is not currently available on any ELA items that are not 
part of the Listening claim. For the Listening test, a deaf or hard of hearing student who has 
a documented need in an IEP or 504 plan may use ASL. 


34, Will sign languages other than ASL (including signing in other languages) be available? 


Currently, only ASL is available. 


35. Can interpreters be used for students who are deaf or hard of hearing who do not use ASL? 


Smarter Balanced has consulted with external experts who have unanimously advised 
against this practice. Research indicates severe challenges with standardization and quality. 


36. What options do districts have for administering Smarter Balanced assessments to students 
who are blind? 


Students who are blind and who prefer to use braille should have access to either 
refreshable braille (only for ELA) or embosser-created braille (for ELA or math). For those 
students who are blind and prefer to use text-to-speech, access to text-to-speech should be 
provided for the math test, and for ELA items only (text-to-speech is not permitted on ELA 
reading passages without a specific documented need in the student’s IEP or 504 plan). 
Text-to-speech use for ELA reading passages is only permitted for those students in grades 6- 
8 and 11. Students should participate in the decision about the accommodation they prefer 
to use, and should be allowed to change during the assessment if they ask to do so. 
Students can have access to both Braille and text-to-speech that is embedded in the Smarter 
Balanced assessment system. 


37. Why is the non-embedded abacus an accommodation for the non-calculator items? Doesn’t an 
abacus serve the same function as a calculator? 


An abacus is similar to the sighted student using paper and pencil to write a problem and do 
calculations. The student using the abacus has to have an understanding of number sense 
and must know how to do calculations with an abacus. 


38. Can students without documented disabilities who have had a sudden injury use any of the 
Smarter Balanced accommodations? 


Students without documented disabilities who have experienced a physical injury that 
impairs their ability to use a computer may use Some accommodations, provided they have 
had sufficient experience with them. Both speech-to-text and scribe are accommodations 
that are available to students who have experienced a physical injury such as a broken hand 
or arm, or students who have become blind through an injury and have not had sufficient 
time to learn braille. Prior to testing a student with a sudden physical injury, regardless of 
whether a 504 plan is started, Test Administrators should contact their district test 
coordinator or other authorized individuals to ensure the test registration system accurately 
describes the student’s status and any accommodations that the student requires. 


Usability, Accessibility, and Accommodations Guidelines 33 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


39. How will the test administrator know prior to testing that the print on demand accommodation 
may be needed? 


The test administrator will know this information prior to testing because accommodations 
need to be documented beforehand and print on demand is an accommodation. Any 
accommodations - including both embedded and non-embedded accommodations - need 
to be entered into the TIDE. The print on demand accommodation applies to either 
passages/stimuli or items, or both. 


4O. For the print on demand accommodation, how are student responses recorded - by a teacher 
using a computer or some other method? 


The method of recording student responses depends on documentation in the IEP or 504 
plan (e.g., after first recording responses on the paper version, the student could enter 
responses into the computer or the teacher could enter responses into the computer.) 
Anyone who is designated to enter responses into the computer must have read, agreed to, 
and signed a test security agreement. 


41. How do state officials monitor training and qualifications for the non-embedded read aloud 
accommodation? 


States will need to develop processes and procedures to monitor training and the 
qualifications of individuals who provide the read aloud accommodation when text-to-speech 
is not appropriate for a student. State officials can use the Smarter Balanced audio 
guidelines available online to obtain additional information about recommended processes 
to follow (http://www.smarterbalanced.org/smarter-balanced-assessments/#item). 


42. If students are using their own devices that incorporate word prediction, will this impact their 
score? 


The students’ score will not be affected under these circumstances. Students using these 
devices must still use their knowledge and skills to review and edit their answers. 


43. How are assistive technology (AT) devices certified for use for the Smarter Balanced 
assessments? 


Assistive technology device manufacturers may use the Smarter Balanced practice test as a 
method of determining if a device works with the assessment. In addition, schools and 
districts can use the practice test to evaluate devices to ensure their functions are consistent 
with those allowed in the UAAG. 


Usability, Accessibility, and Accommodations Guidelines 34 


Smarter 
Balanced 


Assessment Consortium 


Revision Log 


Usability, Accessibility, and Accommodations 


Guidelines 


Updates to the Smarter Balanced Usability, Accessibility, and Accommodations Guidelines are 

captured in this Revision Log. Updates are based on requests from states that do not impact policy. 
Any changes impacting policy require discussion and vote by Governing States. Updates captured in 
the Revision Log are separated into two categories: 


e Clarification: Updates of this type add details to existing information included in the 
Guidelines. 

e Increased Flexibility: Updates of this type reflect explicatory information included in the 
Guidelines that result in augmented access to Smarter Balanced assessments. 


Revisions are captured in tracking tables according to category. In cases where both Clarification 
and Increased Flexibility edits are made, changes to the Guidelines will be captured in the Increased 
Flexibility tracking table. 


(03 F-T aii(er-had(o)apm DY=1-\e1 a] e]u(e) ale) mm Or ar= lay sXo1 


SYsYoum(e) al 

Table 3 9 
| 

Table 4 11 
| 

Table 5 15 
| 

Table 6 16 
| | 

Table 3 10 


Consistently used the term “ELA reading passages” instead of “ELA 
passages” to clarify availability of text-to-speech as an embedded 
designated support. 


Consistently used the term “ELA reading passages” instead of “ELA 
passages” to clarify availability of read aloud as a non-embedded 
designated support. 


Consistently used the term “ELA reading passages” instead of “ELA 
passages” to clarify availability of text-to-speech as an embedded 
accommodation. 


Consistently used the term “ELA reading passages” instead of “ELA 
passages” to clarify availability of read aloud as a non-embedded 
accommodation. 


Added verbiage clarifying the audio component of translated 
glossaries. 


03/12/14 | 1.2 

| | 
03/12/14 | 1.2 

| | 
03/12/14 | 1.2 
03/12/14 | 1.2 

| 
08/01/14 | 2.1 


Section 


Increased Flexibility: Description of Changes 


Table 2 8 


Scratch paper, the non-embedded universal tool, description has 
additional details regarding the performance task testing sessions: 
“For mathematics and ELA performance tasks, if a student needs to 
take the performance task in more than one session, scratch paper 
may be collected at the end of each session, securely stored, and 
made available to the student at the next performance task testing 
session. Once the student completes the performance task, the 


| | 
03/12/14 | 1.2 


Usability, Accessibility, and Accommodations Guidelines 


35 


Smarter Usability, Accessibility, and Accommodations 
Balanced Guidelines 


Assessment Consortium 


Section Increased Flexibility: Description of Changes 


test security.” 


| | 
scratch paper must be collected and securely destroyed to maintain 
| | 


| 
Table 4 13 Added information regarding the availability of translated test 08/01/14 | 2.1 
directions in PDF format. New accessibility resource also added to 
Figure 1 and Appendix A. 
eo ee eee CU 
Table 4 13 To separate setting, added that, “A specific adult, trained ina manner | 08/01/14 | 2.1 
consistent with the TAM, can act as test proctor (test administrator) 
when student requires it.” 


| 
Table 6 


| | 
17 Added information regarding the availability of noise buffers. New 08/01/14 | 2.1 
accessibility resource also added to Figure 1 and Appendix A. 
ee ee es ee eee ee hl!!! 
| Appendix | 24 | Added the FAQs section. | 08/01/14 | 2.1 | 


Usability, Accessibility, and Accommodations Guidelines 36 


Appendix H— Small Scale Trials Technical Report 


Page 26 of 39 


Smarter 


Assessment Consortium 


Smarter Balanced 


Assessment Consortium: 
Small Scale Trials Technical Report 


Developed by: The American Institutes for Research 
July 25, 2013 


Executive Summary 


The Smarter Balanced Assessment Consortium seeks to develop a testing framework that assesses 
student performance with authentic instruments that closely resemble the classroom learning 
experience. This will be accomplished through the integration of technology to achieve better 
measures of deeper learning outcomes that have been difficult to efficiently measure in the past. 
The Consortium seeks to understand and extend the existing state of the art in automated scoring. 


These objectives will be met with a financially sustainable model in which substantial parts of the 
test are automatically scored, either exclusively or in conjunction with some human process. The 
feedback to students is envisioned to be provided immediately after completion of the assessment. 
The Consortium recognizes that this quick response can be accomplished only through the use of 
computer-based testing and the development of automated scoring models for constructed- 
response items. 


The Small Scale Trial represents the second study on automated scoring models. An initial report 
(The Initial Analysis of the Essay Scoring Engine) examined the application of an essay scoring engine 
applied to four Reading items presently in use in two Consortium states. These items were not 
intended to be scored using automated scoring models, but the analysis provided an initial look at 
items presently in use in Consortium states. Automated scoring models built using this data helped 
inform the models built for the fall 2012 Small Scale Trials. The Small Scale Trial data was used to 
further evaluate and improve the essay scoring engine and provide a first look at the application of 
the propositional scoring model to Consortium items. The results from these analyses will further 
inform the essay scoring model vision for the Consortium prior to the 2014 field test. 


The Small Scale Trial sought to expand on the initial model building effort in a number of ways: 


1. Items included in the Small Scale Trial test forms were built to Smarter Balanced 
specifications. 

2. Items were examined in three content areas (reading, writing, and mathematics), at three 
grade levels (4, 7, and 11). A refined essay scoring model was applied to writing essays 
using multiple rubrics. 

3. The propositional model was applied to short content-based constructed responses. 


The purpose of this document is to present the findings from the Small Scale Trial study. 


Background 


For standardized tests, we expect scores to be comparable over time and over different 
administrations and forms of a test. One of the main concerns of constructed-response scoring is 
scoring consistency. The constructed-response scoring process often includes elaborate systems for 
monitoring the consistency and accuracy of the scores (e.g., back-reading, validity scoring, double 
scoring). Even with highly structured training and monitoring, the scoring process still leaves room for 
differences of opinion among raters. 


Computer-automated scoring (CAS) has the potential to make the scoring of constructed-response 
items more objective. It also has the benefit of making constructed-response test items practical for 


use in situations where scoring by human scorers is not realistic. This is especially true for 
computer-adaptive testing where computer-automated scoring can immediately provide data for use 
in the selection of subsequent items. 


Williamson, Bejar, and Hone (1999) listed a number of advantages of modern CAS systems over 
human scoring. With CAS systems, a given response will always receive the same results 
(reproducibility); the same scoring criteria are consistently applied to all responses (consistency); 
specific reasons and processes behind computer scoring can be traced, investigated, and 
manipulated (tractability); items can be constructed in a more precise fashion (item specification); 
responses can be evaluated at a higher level of precision and specificity (granularity); scoring criteria 
are better articulated, and much of the subjectivity in human scoring can be removed (objectivity); 
scoring outcomes are likely to be more reliable (reliability); and the scoring process can be less 
demanding in terms of time, resources, and cost (efficiency). 


The validity of computer-assigned scores using various scoring engines has been evaluated by 
comparing computer-assigned scores with human scores (Attali, Powers, Freedman, Harrison, & 
Obetz, 2008; Bennett, Steffen, Singley, Morley, & Jacquemin, 1997; Klein, 2008; Yang, Buckendahl, 
Juszkiewicz, & Bhola, 2002). The quantitative methods included agreements and correlations 
between raters and between rater and machine scores. Scores were also compared across 
Subgroups. Although the results varied across subjects and item types, the overall findings 
demonstrated that computer-assigned scores were very similar to human scores and suggested that 
machine scoring could facilitate the use of constructed-response items in large-scale testing 
programs by providing a fast, accurate, and efficient way to score responses. Attali et al. (2008) 
evaluated the quality of computer-automated scoring for open-ended items (those requiring a short 
answer of one to three sentences) of GRE® Subject Test items in biology and psychology using the 
c-rater™ scoring engine. The kappa agreements (agreement beyond chance) were higher for 
psychology questions than for biology questions; however, both the human-human agreement and 
human-computer agreement were moderate to high. 


Forms Design 


The scope of the Small Scale Trial targeted computer-automated item scoring in grades 4, 7, and 11. 
Within each grade, a separate 15-item test consisting of selected-response (SR) items was 
constructed for reading, writing, and mathematics. Three constructed-response (CR) items (intended 
for machine-scoring) were included in the test forms in reading and math at each grade level. For 
writing, three pairs of constructed-response sets were constructed for each grade. Two sets included 
a brief writing CR and an essay (long) writing prompt. One set included a research CR and essay 
(long) writing prompt. 


Table 1 shows the number of items included in the Smarter Balanced Small Scale Trials 
administration at each grade. Note that the same SR set was used for each writing form in a given 
grade. 


Table 1. Number of Items Administered at Each Grade for the Smarter Balanced Small Scale Trials 


Selected- Constructed- 
Response Response 
Form — Subject eTe-rel= Items Items Total Items 


TE 


Data 


The goal was to obtain 1000 responses for each of the five test forms in grades 4, 7, and 11. To 
obtain this sample, 911 schools were selected to yield a projected sample of approximately 730 
schools in 23 Consortium states. In the end, 427 schools from 21 states participated. The full 
Sampling Plan can be found in Appendix A. 


Human Scoring 


All student constructed responses were scored by two trained human scorers (100% double reads). 
Responses that received nonadjacent scores (e.g., a 2 and a 4) were routed to an expert scorer for a 
third independent reading. Since O-1 point items could not have nonadjacent scores, each O-1 
point response was scored by two independent readers, and if the scores were not an exact match, a 
third expert reader scored the response. 


The range finding/rubric validation responses formed the basis of the training materials. Range 
finding/rubric validation responses were supplemented with live responses where necessary. These 
responses were selected based on the way that the committees applied the rubrics to the items. All 
training materials were reviewed and approved by Smarter Balanced representatives or by range 
finding committee members. Table 2 details the training materials developed for each item. 


Table 2. Description of Training Materials 


Score Anchor Sets Training Sets Qualifying Sets Qualifying Validity 


meter Rate (exact 
Range (for each (for each item) (for 1 item per grade = agreement) Responses 


item) 


and item type) (for each 
item) 


3 responses/ | 1set of 5 responses 2 sets of 10 90% 10 
score point and responses 
1 set of 10 responses | (20 total responses) 
(15 total responses) 
O 


3 responses/ 2 sets of 10 2 sets of 10 80% 
score point responses responses 
(20 total responses) | (20 total responses) 


Long | 3responses/ | 3 single-trait sets of 5 4 sets of 10 70% in each 
Writing score point responses responses trait 


Item 
(see below for | 1 multi-traitset of 5 | (40 total responses) 


details) responses 


1 multi-trait set of 10 
responses 


Each item was scored by one team of approximately nine to twelve scorers. The only exception was 
the long writing items, which were each scored by two teams of scorers per grade. 


For each item type, scorers were required to qualify on one item per grade. For example, there were 
two grade 4 brief writing items. One of these two items had two qualifying sets. The scorers started 
with the item that had qualifying sets. Since the brief writing items are all scored on a O-2 point 
scale, the scorers were required to correctly score 80% of the responses in one of the two qualifying 
sets. All of the scorers were able to successfully demonstrate the required level of accuracy while 
qualifying. 


Each long writing response received scores in three traits, so some of the training material focused 
on one trait, while other materials considered all three traits: 


e Single-Trait Long Write Anchor Sets 


o Focus and Organization (1-4 point scale): One anchor set with a minimum of three 
responses per score point for each item 


o Elaboration (1-4 point scale): One anchor set with a minimum of three responses per 
score point for each item 


o Conventions (1-2 point scale): One anchor set with a minimum of three responses 
per score point per grade 


e Single-Trait Training Sets (Scorers scored one trait for these sets.) 
o Focus and Organization: One 5-response training set per item 
o Elaboration: One 5-response training set per item 
o Conventions: One 5-response training set per grade 
e Multi-Trait Training Sets (Scorers scored all three traits for these sets.) 
o Set 1: One 5-response training set per item 
o Set 2: One 10-response training set per item 
e Multi-Trait Qualifying Sets (Scorers scored all three traits for these sets.) 


o Scorers qualified on one item per grade. The vendors and Smarter Balanced 
determined which item during range finding. 


o Four 10-response qualifying sets per item 

o In order to quality, scorers had to demonstrate sufficient accuracy as follows: 
» 70% exact agreement in Focus and Organization on one of the four sets 
» 70% exact agreement in Elaboration on one of the four sets 


» 70% exact agreement in Conventions on one of the four sets 


Throughout scoring, several measures were taken to evaluate and monitor quality control. Scorers 
were given ongoing feedback and retraining based on the quality control measures. These quality 
control measures included: 


e Scoring Summary Report: Daily and cumulative reports provided inter-rater agreement rates 
and score point distributions by scorer and room. The cumulative results are reported in 
Table 3. 


e Team Leader Read-behinds: Team Leaders and Scoring Directors “spot checked” the scorers’ 
performance by reviewing approximately 10% of the responses read by each scorer. If the 
supervisor disagreed with the score given by the scorer, the supervisor corrected the score 
and, aS appropriate, shared the response and the corrected score with the scorer as an 
opportunity to provide ongoing feedback and improve scoring accuracy. 


e Validity: Validity responses were pre-scored based on the way that the range finding 
committees applied the rubrics. They were distributed to the scorers throughout the scoring 
window, although they were front-loaded to help ensure that every scorer received every 
validity response. The responses were randomly selected; as a result, some validity sets did 
not contain all possible score points. Scorers were not able to distinguish validity responses 
from live student responses, making this a powerful measure of quality control. At least ten 
validity responses were implemented for each item. The validity results are reported in Table 
4. 


Table 3. Human Scoring Item Summary Report 


Item Summary Report 
Smarter Balanced Assessment Consortium 
Small Scale Trials 


Inter-Rater Reliability Score Point Distribution 
Subject Grade Domain ID 2X %EX %AD %NA| Total %0 %1 %2 %3 %4 XB WF WM YXN* “BW %U" 
Brief Write Grade 04 43403 2,134 81 18 0 | 2,134 15 46 37 0 0 0 0 0 0 0 1 
Brief Write Grade 07 43497 1,412 78 22 1 1,412 50 33 10 0 0 0 0 0 5) 1 0 
Brief Write Grade 07 43964 1,424 85 15 0] 1,424 8 44 43 0 0 0 0 0 2 4 0 
Brief Write Grade 11 43446 1,166 84 16 O | 1,166 8 50 41 0 0 0 0 0 1 1 1 
Long Write Grade 04 Organization 43504 2,088 88 12 0 | 2,088 0 46 32 5 0 0 0 0 4 12 1 
Conventions 2,088 81 19 0 | 2,088 24 33 26 0 0 0 0 0 4 12 1 
Elaboration 2,088 88 11 0 | 2,088 0 45 33 4 0 0 0 0 4 12 1 
Long Write Grade 04 Organization 43334 2,114 86 14 0 | 2,114 0 39 40 9 0 0 0 0 2 0 
Conventions 2,114 81 18 0 | 2,114 30 35 23 0 0 0 0 0 2 0 
Elaboration 2,114 86 14 O | 2,114 0 40 40 9 0 0 0 0 2 0 
Long Write Grade 04 Organization 43284 2,084 85 14 0 | 2,084 0 39 26 10 2 0 0 0 4 19 1 
Conventions 2,084 85 14 0 | 2,084 29 27 19 0 0 0 0 0 4 19 1 
Elaboration 2,084 86 14 0 | 2,084 0 38 26 10 1 0 0 0 4 19 1 
Long Write Grade 07 Organization 43438 1,396 83 16 0 | 1,396 0 46 34 0 0 0 0 1 11 1 
Conventions 1,396 77 22 0 | 1,396 21 34 32 0 0 0 0 0 1 11 1 
Elaboration 1,396 84 15 0 | 1,396 0 47 34 6 0 0 0 0 1 11 1 


' B indicates a condition code designating this response is blank. 

> F is a condition code indicating a response is not in English. 

> M indicates that a response is off purpose. Please note that this condition code is only applicable to long writing items. If a long write receives a code of M, it will still be scored for 
conventions. If a long write receives any other non-scorable code, it will not be scored in any domain. 

* N indicates the response was non-scorable for any reason. For example, this code would be appropriate if a student copied text. 

> Tis used when a response is off topic. For example, a student responds that they hate pizza, when the item was about helicopters. 

° U indicates that the response is unintelligible. This condition code would be appropriate when a student submits random keystrokes or undecipherable text. 


Long Write Grade 07 Organization 43703 1,396 88 12 0 | 1,396 0 53 #33 2 0 0 0 0 1 10 0 
Conventions 1,396 79 20 1 | 1,396 34 31 24 0 0 0 0 0 1 10 0 
Elaboration 1,396 88 12 0 | 1,396 0 53 34 1 0 0 0 0 1 10 0 
Long Write Grade 07 Organization 43469 1,384 84 16 0 1,384 0 39 34 9 2 0 0 3 2 10 0 
Conventions 1,384 81 19 0 | 1.384" 32 35 271 0 0 0 0 0 2 10 0 
Elaboration 1,384 83 17 0 | 1,384 0 41 33 9 2 0 0 3 2 10 0 
Long Write Grade 11 Organization 43632 1,142 86 14 0 1,142 0 39 37 12 0 0 0 0 1 9 2 
Conventions 1,142 81 19 O| 1142 15 31 42 0 0 0 0 0 1 9 2 
Elaboration 1,142 86 14 0 | 1,142 0 40 37 11 0 0 0 0 1 9 2 
Long Write Grade 11 Organization 43635 1,134 84 16 0 1,134 0 28 43 19 0 0 0 0 0 7 3 
Conventions 1,134 76 24 0 1,134 17 35 37 0 0 0 0 0 0 7 3 
Elaboration 1,134 84 16 0} 1,134 0 29 43 £418 0 0 0 0 0 7 3 

Item Summary Report 

Smarter Balanced Assessment Consortium 
Small Scale Trials 
Inter-Rater Reliability Score Point Distribution 

Subject Grade Domain ID 2X %EX %AD %NA| Total %0 %1 %2 %3 %4 %%B' WF WMP YXN* XT %U" 


Long Write Grade 11 Organization 43479 1,146 85 15 0 1,146 0 36 42 13 0 0 0 0 1 7 1 
Conventions 1,146 80 20 O | 1,146 21 35 35 0 0 0 0 0 1 7 1 
Elaboration 1,146 84 16 O |} 1,146 0 43 40 9 0 0 0 0 1 7 1 
Mathematics Grade 04 43572 2,126 95 5 0 | 2,126 56 41 0 0 0 0 0 0 0 3 0 
Mathematics Grade 04 43564 1,438 100 0 0 | 1,438 89 2 0 0 0 0 0 0 1 7 1 
Mathematics Grade 04 43173 2,126 98 2 0 | 2,126 81 14 1 0 0 0 0 0 0 2 1 


' B indicates a condition code designating this response is blank. 

> F is a condition code indicating a response is not in English. 

> M indicates that a response is off purpose. Please note that this condition code is only applicable to long writing items. If a long write receives a code of M, it will still be scored for 
conventions. If a long writes receives any other non-scorable code, it will not be scored in any domain. 

*N indicates the response was non-scorable for any reason. For example, this code would be appropriate if a student copied text. 

> Tis used when a response is off topic. For example, a student responds that they hate pizza, when the item was about helicopters. 

° U indicates that the response is unintelligible. This condition code would be appropriate when a student submits random keystrokes or undecipherable text. 


Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Reading 
Reading 
Reading 
Reading 
Reading 
Reading 
Reading 
Reading 
Reading 
Research 
Research 


Research 


Grade 07 
Grade 07 
Grade 07 
Grade 07 
Grade 11 
Grade 11 
Grade 11 
Grade 04 
Grade 04 
Grade 04 
Grade 07 
Grade 07 
Grade 07 
Grade 11 
Grade 11 
Grade 11 
Grade 04 
Grade 07 
Grade 11 


43551 
43555 
43557 
43639 
43559 
43552 
43546 
43707 
43412 
43416 
43248 
43445 
43422 
43297 
43435 
43397 
43280 
43468 
43491 


1,430 
1,422 
1,422 
1,422 
1,164 
1,122 
1,146 
2,134 
2,124 
Pasi Wi 
1,430 
1,426 
1,410 
1,170 
1,164 
1,158 
2,102 
1,406 
1,154 


92 
92 
97 
96 
99 
100 
97 
91 
91 
93 
80 
85 
81 
86 
87 
87 
93 
95 
89 


Oo OO N wo OO +> Ff WO WO COC 


= = po 
Oo WwW O 


=— — 
Oo Oo 


10 


- Oo NO OO OD 0 OO GO ODO OO 


oOo OW WWD NY CO 


1,430 
1,422 
1,422 
1,422 
1,164 
1,122 
1,146 
2,134 
2,124 
242 
1,430 
1,426 
1,410 
1,170 
1,164 
1,158 
2,102 
1,406 
1,154 


79 
62 
72 
63 
94 
93 
75 
53 
59 
43 
53 
67 
66 
56 
67 
74 
70 
35 
28 


37 
28 


ooe0omUmlmCcOUUCOUOUmUCOUOUCUCOUOUCOUOUClCOCOCOCOTOCCOTCOTTC COT CO OT OC 


ORS OFS COCRSe CRS CORRS CRe CRe CRS CORe © 


ooee0o0mUmlmUCcODUmCOUOUmUCOUOUCOUOUCOUOClCOCCOCCOCOTC CTD OC I OT CT OT OC 


ooeon0onmUmcOUlCUCOUOUmUlCOUOCOCCO CON CCOTUCOTFC OT C TD C T —OT C T — —O OO 


One OFS OCORSe COGS CRS CRS CRS CRS CRGe © 


ORS OFS CRe CRS CRG COReSe CRS CRS CRG!e © 


—_ 


wo a fF OA NYO FNS WO = 


— 


— Of — 


=e) =f fe, 3 ees Oe) =| Be 


10 


Table 4. Validity Summary Report for Human Scoring 


Validity Summary Report 
Smarter Balanced Assessment Consortium 


Subject Grade Domain ID #NA Wx #NA %NA MEX 
#R LO #LO/| #EX | #HI HI LO %LO | %EX | %HI 
Brief Write Grade 04 43403 120 0 1 111 8 0 0 1 93 7 
Brief Write Grade 07 43497 118 0 5 99 14 O 0 4 84 12 
Brief Write Grade 07 43964 152 0 5 145 2 0 0 S 95 1 
Brief Write Grade 11 43446 131 0 8 120 3 0 0 6 92 2 
Long Write Grade 04 #£Organization 43504 267 0 16 250 1 0 0 6 94 0 
Elaboration 43504 267 0 17 249 1 0 0 6 93 0 
Conventions 43504 267 0 27 222 18 O 0 1 83 7 
Long Write Grade 04 Organization 43334 285 0 9 271 5 0 0 3 95 2 
Elaboration 43334 285 0 10 271 4 0 0 4 95 1 
Conventions 43334 285 0 4 238 42 1 0 1 84 15 
Long Write Grade 04 Organization 43284 285 0 4 278 3 0 0 1 98 1 
Elaboration 43284 285 0 5 278 2 0 0 2 98 1 
Conventions 43284 285 0 9 270 6 0 0 3 95 2 
Long Write Grade 07 #£Organization 43438 280 1 11 241 2/ 0 0 4 86 10 
Elaboration 43438 280 1 11 241 2/- 260 0 4 86 10 
Conventions 43438 280 1 11 232 35 1 0 4 83 13 
Long Write Grade 07 #£Organization 43703 300 0 9 245 46 O 0 S 82 15 
Elaboration 43703 300 0 10 244 46 O 0 S 81 AS 
Conventions 43703 300 0 19 262 19 O 0 6 87 6 
Long Write Grade 07 #£Organization 43469 266 3 4 226 33 0 1 2 85 12 
Elaboration 43469 266 3 5 226 32 =60 1 2 85 12 
Conventions 43469 266 3 8 234 21 0 1 3 88 8 


Small Scale Trials 


CO OC OO Ree © OC OC Rae OC OC 0 Rae OR © 


1] 


Long Write 


Long Write 


Long Write 


Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Mathematics 
Reading 
Reading 
Reading 
Reading 
Reading 
Reading 


Grade 11 


Grade 11 


Grade 11 


Grade 04 
Grade 04 
Grade 04 
Grade 07 
Grade 07 
Grade 07 
Grade 07 
Grade 11 
Grade 11 
Grade 11 
Grade 04 
Grade 04 
Grade 04 
Grade 07 
Grade 07 
Grade 07 


Organization 
Elaboration 
Conventions 
Organization 
Elaboration 
Conventions 
Organization 
Elaboration 
Conventions 


43632 
43632 
43632 
43635 
43635 
43635 
43479 
43479 
43479 
43572 
43564 
43173 
43551 
43555 
4355/7 
43639 
43559 
43552 
43546 
43707 
43412 
43416 
43248 
43445 
43422 


300 
300 
300 
323 
323 
323 
344 
344 
344 
90 
90 
90 
90 
162 
90 
99 
90 
90 
90 
110 
110 
78 
101 
110 
110 


Ome — Be Of OR —i Oe 


a Oe One Che © Be CO Be OC Re OC Re OC Ree ©O OC 0 Ree 
—_, 
NO 


NO 
© 


253 
263 
253 
284 
284 
248 
287 
290 
284 
85 
90 
90 
87 
153 
89 
99 
90 
90 
90 
108 
94 
78 
84 
92 
80 


3 Oe OC me Che © Be © he © Be © Be © Ree OC OO 0 aaa 
7 Oe OC he Che © Be © Be © Be © Be © Ree OO OC 0 aa 
- - oO0O0O @O-" 00 OC CO fF KH OOOO TAInNnN OWNN OF FF 


ownWwaeaoend-aToTncno0oowWewnao aod oO 
oo = 


84 
88 
84 
88 
88 
77 
83 
84 
83 
94 
100 
100 
97 
94 
99 
100 
100 
100 
100 
98 
85 
100 
83 
84 
73 


Oe Om Che © Be © he © Be © Be © Ree OC OO 0 aa] 


aownnWwdoeonw’i-o09Tnceoe0o0o 1 W Oo O 


Validity Summary Report 
Smarter Balanced Assessment Consortium 
Small Scale Trials 


Subject Grade Domain ID #NA Wx #NA %NA MEK 
#R LO #LO | #EX | #HI HI LO %LO | %EX | %HI 

Reading Grade 11 43297 100 0 6 92 2 92 2 
Reading Grade 11 43435 100 3 10 85 2 85 2 
Reading Grade 11 43397 100 1 16 80 3 80 3 
Research Grade 04 43280 100 1 15 81 3 81 3 
Research Grade 07 43468 100 1 5 85 9 85 9 
Research Grade 11 43491 100 0 19 81 0 81 0 


10,671 24 493 | 9,402 


ltem Analysis and Data Review Procedures 
Classical Item Analysis 


After the automated scoring of the CR responses, the complete dataset was subjected to item 
analyses. AIR’s analysis program computes individual item and overall test statistics for each 
selected-response and constructed-response item to check the integrity of the item and to verify the 
appropriateness of the difficulty level of the item. The score used to compute the biserial correlations 
and the DIF ability stratification was the raw number-correct score, within form, based on the human 
scores for the constructed-response (CR) items. 


Key statistics that were computed and examined include the following: 


Item Discrimination: The discrimination index is calculated as the correlation between the item score 
and the student’s total number-correct score (biserial correlations for selected-response items and 
polyserial correlations for constructed-response items). Selected-response items were flagged for 
Subsequent review if the biserial correlation for the item was less than .25 for the keyed (correct) 
response and greater than zero for distractors. Constructed-response items were flagged if the 
polyserial correlation was less than .25. 


ltem Difficulty: Items that are either extremely difficult or extremely easy were flagged for review but 
not necessarily for removal if the item discrimination index was not also flagged. For selected- 
response items, the proportion of test-takers in the sample selecting the correct answer (the p-value) 
was computed, as were the proportions of those selecting each incorrect response For constructed- 
response items, item difficulty was calculated both as the item’s mean score (average item score) 
and as the average proportion correct (analogous to p-value). Items were flagged for review if the p- 
value was less than .25 or greater than .90. 


Constructed-response items were flagged if there were very few students scoring in a given category 
or if a very high proportion of students fell in any single score-point category. The latter may suggest 
that the other score points are not useful or, if the score point is in the minimum or maximum score- 
point category, that the item may be too difficult. Constructed-response items were also flagged if 
the average ability estimate of students in a score-point category was lower than the average ability 
estimate of students in the next lower score-point category. For example, an item was flagged for 
review if the average total score of those receiving a score of 3 on the constructed-response item 
was lower than the average total score for those receiving a 2 on the constructed-response item. 


Differential Item Functioning (DIF): DIF analyses are designed to determine whether students at 
similar levels of ability have different probabilities of answering the same item correctly (or of 
receiving the same scores in the case of constructed-response items) based on group membership. 
A variety of factors may lead to differential item functioning, but DIF may indicate item bias. 


DIF analyses were conducted on all items included in the Small Scale Trial forms to detect potential 
item bias for Subgroups (Sample sizes permitting). The performance on each item by focal group 
members (e.g., protected ethnic group members, females) was compared with the performance of 
the appropriate reference group (e.g., white students, male students). The purpose of these analyses 
is to identify ttems that may have favored students in one group (reference group) over students of 
Similar ability in another group (focal group). 


14 


The procedures used for detecting DIF are the Mantel-Haenszel (MH, 1959) chi-square for 
dichotomous items (multiple-choice items) and Mantel’s chi-square for polytomous items 
(constructed-response items). The Mantel-Haenszel statistic (MH D-DIF) is calculated for multiple- 
choice items (Holland & Thayer, 1988) and standardized mean difference (SMD) for constructed- 
response items (Dorans & Schmitt, 1991; Zwick, Donoghue, & Grima, 1993) to measure the degree 
and magnitude of DIF. The total scale score on the test was used as the ability-matching variable. 
The AIR analysis program computes the MH chi-square value, the log-odds ratio, the standard error 
of the log-odds ratio, and the MH-delta for the selected-response items, as well as the MH chi-square 
the SMD, and the standard error of the SMD for the constructed-response items. Items were 
classified into three categories (A, B, or C) ranging from no DIF to mild DIF to severe DIF according to 
the DIF classification convention. Items were also categorized as positive DIF (i.e., A+, B+, or C+), 
signifying that the item favored the focal group, or negative DIF (i.e., A-, B-, or C-, signifying that the 
item favored the reference group. 


A DIF classification of C means that the item shows significant DIF and should be reviewed for 
potential content bias, differential validity, or other issues that may reduce item fairness. Items in the 


7 


C category for any group were flagged for subsequent review by the Fairness Data Review Committee. 


Table 5 details the DIF classification rules. 


Table 5. DIF Classification Rules 


DB) Tam Oy-Wa kere) =a FLAG CRITERIA 
Dichotomous Items 
C MHy* 


is significant and | A,, [21.5- 


2 A 
B MUX" is significant and | Ava 1.5 - 


2 
A MEX” is not significant. 


Polytomous Items 


C MHZ" ig significant and |S¥@P|/|SD B25. 
B MHZ" ig significant and |SMP|/|SD|<.25_ 
A MH y’ 


is not significant. 


15 


Item and Test Form Results 


Table 6 presents test score (raw) information, by test form, within content area. For reading and 
writing, the means and standard deviations were similar across grades and forms. The writing forms 
were of moderate difficulty. The reading forms were a little more challenging. The mathematics 
forms tended to be the most challenging and tended to get more difficult with increasing grade. No 
student received the maximum possible points on any test. 


Table 6. Raw Score Descriptive Statistics by Test Form 


Min | Max | ax 
Content Grade Form N Wezel a’ Possible 
0] ot 0] of 


Points 


59/0 | 4.9 


ltem Analysis Results 


ltem analysis results are reported, by form, in Tables A1-O1 in Appendix C. Two brief writing items 
(43966, grade 4; 43486, grade 11) were removed prior to analysis after content review. Similar to 
the test scores, the mean item difficulties for the writing items were of middle difficulty, the reading 
items were slightly more difficult, and the mathematics forms had more difficult items. The item 


16 


discriminations (biserial/polyserial correlations) were quite high, averaging between the middle .40’s 
and the middle .50’s. 


DIF Results 


To avoid large numbers of false positives, DIF analyses were only run if the sample size in the focal 
group was at least 100. As a consequence, the primary comparisons were between male/female 
students and Hispanic/white students. Sample sizes were large enough for some LEP/non-LEP 
comparisons in grade 4 where the sample sizes were larger. Both positive DIF (favoring the focal 
group) and negative DIF (favoring the reference group) were observed. Table 7 provides a count of 
the number of items flagged for DIF and the direction of the DIF. More specific results can be found 
in Tables A2-O2 in Appendix C. There was nothing unusual about the pattern of DIF in these forms. 


Table 7: DIF Flags by Content (Form), Grade, and Comparison 


LEP md ailelis 
‘Cle-lo(-mime) anny) Cl eclol= vs. 


Hispanic vs. 
VS. 
non-LEP Wireless 


White 


©) 
|. 


Mathematics 


Mathematics 


- 


- 


Mathematics 
Reading 
Reading 
Reading 


Writing (Form A 


eX“, 
~~ 


Writing (Form B 
Writing (Form C 


Form A 


_—_—~ |_ r«_ 


Writing 


Writing (Form B 


—_—~ | -_—~™ 


Writing (Form C 


! 


Writing (Form A 


-— 
= 


Writing (Form B 


Writing (Form C 


—— 
= 


ESZESEAEAES 
EERIE 
ee ee eee 
ee ee 
ee ee ee 
ESERIES ETE 
ee ee ee 
ee ee 
ee ee 
ee ee 
po | | - | 2 | 
ee ee 
ee eee 
EEE ETE 
ee ee ee 
ERIE EES 


17 


Model Building Analyses 
To evaluate model building success, the following analyses were performed: 


1. acomparison of the descriptive statistics of the item scores under human and automated 
model scoring; 

2. the percentage exact agreement and agreement within 1 point (adjacent match), between 
the first human score and the second human score; 

3. the percentage exact agreement and agreement within 1 point (adjacent match), between 
the first human score and the automated model score; 

4. the polychoric and Pearson correlations between the validated human score, the first human 
score and the automated score; 

5. Kappa statistics to assess the amount of agreement in scores over chance agreement. 


Development of Automated Scoring Models 


Two Scoring Model Approaches 


The two primary classes of automated scoring models examined here are: the empirically developed 
black-box model, primarily used for scoring long writes/essays, and the theoretically based glass-box 
models driven by the specificity of the scoring rubric, primarily used for scoring short semantic 
responses. Full rubrics for the Small Scale Trial items can be found in Appendix B. 


The development of the engines is an iterative process. This is especially true when newly developed 
models are used. Therefore, following the initial scoring model development, cases where the 
scoring engine fails to accurately score a response are examined initially on a regular basis and 
periodically thereafter. This information is used to identify places where the scoring model can be 
improved. The recommended safety measure is for humans to periodically score a sample of papers 
for each model and then compare these responses with the automated scores during operational 
scoring. 


Automated Essay Scoring Model (Black-Box Model) 


The development of an automated essay scoring model is a data-driven approach. While black-box 
essay scoring engines may correlate reasonably well with human scores, they only incorporate 
Shallow semantics. They do not evaluate the logic or quality of argumentation. These elements are 
within the scope of the Consortium scoring rubrics. For example, the “Evidence and Elaboration” 
domain for the argumentative rubric contains the specifications identified in Exhibit 1. 


18 


Exhibit 1: Evidence and Elaboration 
4 point score description, Argumentation 


The response provides thorough and 
convincing support/evidence for the 
controlling idea or main idea that 
includes the effective use of 
sources, facts, and details. The 
response clearly and effectively 
expresses ideas, using precise 
language: 


e comprehensive evidence from 
sources is integrated 


e references are relevant and 
specific 


e effective use of a variety of 
elaborative techniques is 
demonstrated 


Use of domain-specific vocabulary is 
clearly appropriate for the audience 
and purpose. 


Essay scoring engines can pick up on relevance of the vocabulary to the topic in question, but have 
no access to the logic, other than through the incidental correlations with vocabulary or other 
syntactic features of the text. Proposition scoring engines cannot currently score such responses 
because the number and complexity of possible reasonable arguments cannot realistically be 
enumerated. Hence, the valid scoring of such rubric elements extends beyond the current state of 
the art. 


That said, the naturally occurring correlations between the features these engines measure and 
writing traits that we want to measure often results in accurate scoring in many cases. It is the less 
common responses or the responses with good argumentation but (for example) poor spelling or 
word choice where the correlations would not prove accurate predictors. 


Human Scoring for Model Training 


Black-box models are developed in two phases, a training phase and a validation phase. The training 
for black-box scoring models uses human ratings as the primary source of information for developing 
the automated scoring model. As such, the quality of the automated model is related to the level of 
agreement between the human raters. This will be reflected in the consistency of the scores 
assigned. To provide the best information for the automated model development process, it is 


19 


important that the human ratings be as accurate as possible. The best way to accomplish this is to 
have each student response scored twice by two trained human scorers. If the scores for two trained 
scorers do not match exactly, the scores should be sent to an expert scorer for resolution. The score 
used in the model building should include the information from this resolution process. 


During the training phase, the scoring engine is subjected to exemplars that define the bounds 
within which to recognize patterns. This is a data-driven approach, so human scoring needs to have 
been completed prior to the development of the model. This methodology is applied to responses 
that are varied and require complex modeling, including the extraction of syntactic and semantic 
features, followed by feature-space mapping and dimension reduction approaches, followed in turn 
by regression or other statistical prediction of the validated human scores based on the dimension- 
reduced feature-space. 


Propositional Scoring (Glass Box) Model 


The proposition scoring engine is a glass-box model for which test developers build explicit rubrics. 
This model uses a set of (potentially interrelated) propositions. This approach differs from natural 
language understanding in that it seeks to recognize relationships specified by the rubric author, 
rather than to infer relationships from natural language. 


Broadly speaking, propositions are built from concepts and relationships. A concept is a collection of 
words that have similar meaning. Similarity may be defined as synonymy, ontological relationships, 
or other relationships that may be selected by the rubric author. Concepts may be modified by 
specifying their scope. For example, a dog refers to any instance of a dog, while the dog refers to a 
specific instance of a dog, and dogs will refer to the abstract entity rather than any single instance. 


Relationships are represented using triplets, containing an agent, an object, and a relationship. Each 
element in the triplet may be modified or expanded by attributes. The triplets represent syntax- 
independent descriptions of concepts. In most cases, relations correspond to verbs. For example, 


A dog chased a cat, or the dog chased a cat, or the cat was chased by the dog are all represented by 


Relation: 
Chase 


the relational triplet in Figure 1. 


Figure 1: Relational Concept representing "the dog chased the cat" 


The sample sentences might be distinguished by adding qualifiers. For instance, specifying that the 
dog was a named instance might preclude a match in a sentence referring to a dog. 


The determination of equivalence concepts is useful for defining the synonymy between concepts 
and their relational equivalents. Continuing the example above, the dog treed a cat is just another 
way of saying that the dog chased the cat up the tree. Recognizing the equivalence, however, 


20 


requires the semantic knowledge that to “tree” means to “chase up a tree.” Equivalence concepts 
provide a mechanism for encoding this semantic information. 


When an examinee response is captured, it is first parsed into a syntactic parse tree (like a sentence 
diagram), using a parsing algorithm. The matching algorithm then searches the parse tree for 
evidence of the propositions defined in the rubric. A final scoring stage assigns scores based on 
Boolean collections of propositions. 


Test developers build the concepts to be scored in student responses through specifications that fit 
within the specific scoring engine template based on the scoring rubric for the item type. The 
validation of glass-box models entails rubric validation to verify that the logic and salient features of 
the rubric are complete. When selecting papers for rubric validation, it is useful to disproportionately 
select high scores on the target items received by otherwise low-scoring students and low scores on 
the target items received by otherwise high-scoring students as a successful strategy to refine the 
machine-scored rubric. 


ltems that are intended to be scored with a glass-box approach will be automatically scored using 
the machine-scored rubric created during item development. These items will have preexisting 
scores that will be validated during the range finding/rubric validation process. The items will have a 
range of correct responses, and the range finding/rubric validation committee will validate or adjust 
the computer-generated scores or broaden the scoring rubric to encompass additional valid 
responses. 


Scoring Model Results 
Essay (black box) model results 


The long-write essay items from the writing test forms were scored with rubrics for three domains 
(organization, elaboration, and conventions). Separate scoring models were developed for each 
domain. Descriptive information about sample size, average essay response length, first and second 
human scores, and automated scores can be found in the first 4 sections of Table 8. The last two 
sections of the table compare human/human scores and human/automated scores, respectively. It 
was planned to obtain 1000 responses per item using 500 responses to train the scoring engine 
and 500 responses to validate the model. In most cases this was achieved; however, the observed 
sample sizes were smaller in grade 11. When the sample sizes were small between 400 and 450 
responses were used to train the engine and the rest were used to validate the model. This 
sometimes resulted in small numbers of cases for validation. However, it is important to build a 
stable model or the validation results will suffer as a consequence. 


Essay response length tended to increase by grade, with grade 4 students producing noticeably 
shorter responses. For the organization and elaboration scores, the means and standard deviations 
were very similar between the human and automated scores. Agreement and correlational 
measures tended to be slightly lower for the human/automated values than for the human/human 
indices for these measures. Two cross-tabulated agreement tables were constructed for each item: 
one compared the first human score with the second human score; the other compared the first 
human score with the score produced by the automated scoring engine. The human/human table 
should be used as a baseline against which the human/automated results are compared. The 
tables containing these results can be found in Appendix D. The agreement between the 


21 


human/automated scoring and the human/human scoring was uniformly high for the organization 
and elaboration rubrics, though the human/automated indices were always lower. 


For the conventions rubric, the results are less consistent, particularly for the fourth grade items. 
The means were further apart and the standard deviations for the automated model were uniformly 
smaller. Agreement and correlational measures were substantially lower for the human/automated 
comparisons than they were for the human/human comparisons. The agreement for conventions 
between human/human comparisons was lower than expected and an issue for which research is 
ongoing. The automated model score distributions, shown in Appendix D, tended to be more peaked 
than the human distribution. There were fewer extreme scores assigned by the automated model 
(regression effect). 


Propositional (glass box) model results 


For constructed-response items, descriptive information about sample size, average response length, 
first and second human scores, and automated scores can be found in the first four sections of 

Table 9. The last two sections compare human/human scores and human/automated scores, 
respectively. The sample size issues outlined above hold for these items as well. With one exception, 
501 responses were used to hone the propositional model rubrics. For item 43564, 471 responses 
were used for this purpose. The response length for these items tended to be between 20 and 40 
words. Writing items in grades 7 and 11 tended to be a little longer, mathematics items in grade 4 
tended to be between 10 and 15 words. 


The validity standard used to qualify raters was an exact agreement of 90% for a 1-point item and 80% 
for a 2-point item. Overall, the performance of the scoring engine met this standard for 50% of the 
items across all grades and subjects. 


Mathematics 


About 67% of the cases (six out of nine items) met the validity standard used to qualify raters. The 
scoring engine performed well enough for six items, including the two cases in which the engine 
performed exactly as well as humans. The cross-tabulated agreement tables, shown in Appendix E, 
compare the first human score with the second human score and the first human score with the 
automated score. Overall, agreement and correlational measures for the human/automated 
relationship tended to range from the same as or slightly lower than the human/human indices for 
these measures. The cross-tabulated agreement tables, shown in Appendix E, compare the first 
human score with the second human score and the first human score with the automated score. As 
indicated in the tables, more low or O scores were produced using the propositional scoring model. 


There was an issue with some mathematics items, unrelated to the scoring model, in that all or 
almost all students received a score of O from human scorers. This was true for item 43564 in grade 
4, and items 43559 and 43552 in grade 11. 


22 


Reading 


The reading responses were about the same length across grades. For only 33% of the items, the 
engine performed well enough to meet the validity standard. For the items in which the engine failed 
to meet the standard, the exact agreements between human scorers tended to be about 10% lower 
than in ‘met-standard’ cases. This may suggest that the rubrics were not as clearly specified as 
needed for these items. Overall, agreement and correlational measures tended to be lower for the 
human/automated score comparison than for the human/human score comparison. The cross- 
tabulated agreement tables comparing the first human score with the second human score and the 
first human score with the score produced by the automated scoring engine indicate that the 
automated scoring engine produced lower scores with substantial numbers of zero scores. The 
agreement tables showing this can be found in Appendix E. 


Writing 


The writing items where the propositional model was applied were limited in number, with only one 
item per grade. The engine performed well for two of the three items. The engine met the validity 
standard for the grades 4 and 7 items. The grade 7 item showed not only good exact agreement but 
comparable means between the human and automated scores. The standard deviations for the 
automated model were smaller than or similar to the human score for the grades 4, 7, and 11 items. 
The agreement and correlational measures were lower for the human/automated score comparison 
than for the human/human score comparison, particularly for the grade 11 item. The cross-tabulated 
agreement tables comparing the first human score with the second human score and the first 
human score with the score produced by the automated scoring engine support these results. The 
agreement tables showing this can be found in Appendix E. 


23 


Table 8. Statistics Summary for the Multiple Dimension Essay Writing Items 


vddey ~r]Prm]Tt}afofufjolria mln dRnpnfatlalalala a}on]o 
Ske tre | ens | ea SITPCSLeCT ele] eTe]|e TPC] e 
Po}YSII MA S eso] [Renin Remit esi Ros) [pec] Slo leloe |olo lo i> Se res 
WON [o110-) 
mire] Alrpolalala]r ofmelat [+i +] tio] 
IIOYIVA coy, oo] '4 COP NT STC] NY] eC]e CE SE) SS) Sa) |S) Ee Aporp 
; Ste ics Se re he [koe S1eeo bo Seales 
olaeloalalololof[ria mI a] rlolmt atl a]|ata A} st] 
LTS EN) | IRS SS (So el el fas eal (Lal (a Slt lelele]esejyasc st | 96 | 06 
uosivag Slo toto Vee ho wes [re Selelole Ve slots elo|s> 
dT UMM |olo S oIlalolo ea) | as) Slelmelelelwm lols 
SS S Shower Silo Sola lselolalelso 
JUIUIIBISV onl foe — — alee onl hoe! nll hoe Lone hoe onl hoe! 
JUIWIIBISY | co ofatan ofr fre rn |[+t ni alalatlat[ alata ALATA] 
t~ wr (oe) (oe) a) t~ C~ | a t~ t~ WwW | CO (oe) WwW | 0O CO | \O (oe) WwW | 0O (oe) 


WONClI110-) 
TIALaI ATT a] [oo ALA TO Tota] [oo 
IMOQIeA ALA ALALALALTALALe Le Lele lreparparpararararasraye 
S/o S/S lo)ol lol los | o|o]}o lol sS | Solo lio |o lo! ol) co |c 
UOHvlP1107) = NPopaytt ed Se inal andl SO Sell andl benll oe ines MIATA] | 00 
: a Sara (eal (See (ica: SISISTSKTSlLSsl SIT e]ele CYC] ele 
UuOSIBId eS Solololo Selolo !lclsc Selo lols SS Se | SS 
dt uMmIM |lolololololololo |e olololol[ol[loloto 
Slo lolol ol oloilo =e Slololololso!co!]c 
JUIUIIBISV nll ent Mtoe heel heen ll Bteeell Misell lon nll oe eel eel eee heel ieee ll Ghee Mi oele Mon 
yUIWIIIISV olanla]— —=Ilnla alunfo ln 00 +]—] © | 0 
yUxy tm | co | co | 00 00 | 00 | co 00 | co | ~~ | co tm 00 | co | 00 | co 
uoHneIAIGg WM | oO] oa | & mMIATmPRmITara MPNOPO meP_oOorntial ley ye ]yoyT Tt 
ue Wm 1m] oOo] oo oy, m : >TO]. : ic -1TOTm] oO]. a oe 
: : 4 4 é El [ent] [to é s SS Bal [ce a“ : 3 cs : : 
pAvpueys SLe eo toS aap) (ex) Sls = Sra |S |S SS 
STPoALOlLALALOLT TI Lal ni nl al tlolaltrmn i alo flo lola]|u 
UBofl : ITSO TMe TSP ALPS SR Pee rTaqar mem Tael eT e1TerarlrTelaler1rclsc 
= = = => — = = = = = S = = => = = N N 
uoOnevIAIGg a] Ola MmPOIAn Po To Ort aI oT ATaITm co JO} + 
eae eopors SPepPuarprsers mperperereayoys Srey 
pAvpuBjys Slo !)c> Seo Sole |S Seles fo |) ole lc S |S lo 
TIPO oLOoOTPm ana rpri int ay rm ft & — |TOToO mI mI nan ~Nni«in 
uv : ISTP TC TAT CITC TP ePesyTerTeor™7stse SPOTS LTAlTAImMITclsc 
— —S S Sl S Sl Sl — Sl Sl Sl Sl S — — Sl N N 
UOnBIAIGg = Ov TIOPNI NPorartaraoiat=: NPN Pop mei oyo yt [ce 
BOE ‘© as sea) fetes) (Easel 0) (Pe |i ISS) ad fle) re Pate | ee Sl ee ee eS 
pAvpueRySs = S SoTeolLolel elo lo lc lS |S See ke Ne eS 
~PrPoPTI ALR LolpolfTtf~aplyTmny TPL ALR aN ALALTnILaltIlTalrai;#a 
uvol TypT. SLC LTeToelalfesyefTesloI¢~mmetT rsAarerereler reir s1clc 
= = = — => S = = = —S = =>) = —S => = = = N N 
Slo lLOINANIT NANT ANT oP opto tata aA OPO POP Re Pe Et oto poy Sty 
ce Sl Sl Sl Sl Sl Sal Sl Sol Sl Sl So N N 
MPN HN olaol lal ti Sti SILA Al ral ol ast oI A Torardrst di tit ist dst 
N SulUrely [epoy| ALATAToS ololalalals i +i/ai/sa/s+ltlol+tlolst is] 
a ee oe ces ces Mie Be Ge Ge ee Oe Oe ee ee 
DNILAILATNI NI Vn opfofpfyosly tity aALolL npPolnt ins ort tit 


First Human Scorer vs 
Automated Score 


First vs Second Human Scorers 


Automated Score 


Laz, 
46 


Second Human Scorer 


1.46 
1.45 
1.03 


First Human Scorer 


44 
44 


65 
65 


11 
11 


Dimension 
Elaboration 


Item 
ID 
43504 
43334 
43284 
43438 
43703 
43469 
43632 
43635 


24 


2] 72 | 100 | 0.73 | 0.86 | 0.72 | 57 98 0.52 | 0.66 | 0.47 


0 


0.59 


2 | 1.26 


65 | 444 | 224 2 O71 0 


11 


Convention 


1.46 
TOreaniation | rf | 455 [1 [its fows] af a} usefoos] 1[ s[amfoor 1] 3] s8 | 10 [ox] oso] oso] a1] wo] 077 | 092 | oa7 
etaboraion | | | 455 [im far foo] af a] wf oof 1[ s[i@fom] 1s] s [io] os | oss] os | 7] 100] 074 | 088 [on 
convension | rf @fassfiw | iafon] of af 2] or] of 2frasfoml of 2] [iw for] oo [ors] ssf | osfom | os 


43479 


Table 9. Statistics Summary of Propositional Model 


CN} CO] t] dd] oOo] ww] <x</cs mM} oO] m 
eddey paiysiai SS) 2) 3/2] 2/5 >| x 3/32 
nm) & Mm] 
DOE e402) LN | 00 o} $ nS 00 on ra) 
I40Yde1}9] /IOYDIA|Od S| Oo O| S| O| o = 
AUG 
épaepueys aa VIIA 
oO} cajun] st tl oo} wol/n;| at 
eddey 1431aM CO} 21h] 2 >] 2] BM} Q] © 
d}| oO] d/o =| co} c0| ol o 
uol}e]a1105) fo) t+] o| < m1 nm] o| st 
= Or} i, OY) OR 09 | 2 
340Y4de1}9] /IOYDIAjOd od oie) |= CO] CO] O/ Oo 
Oo| a] unl cst O!| tL] co] O| NI t+/10O 
UO!}E/AAIOD UOSILSd Sa eal ce pe acd Bact et ie ee: 
d}| oO] o/ oO oO} =| do| o| So] c/o 
d sy |S/ S/S] 8/8] S| 8] S| 8} a] S| 8] S| 5 
FOC CIN Wetheoley ala) ala] a] a] ay Ss a) da| 4 
(%08 -3dz ‘ %06-14T) . 
° oO 
épsepueys AyipljeA aN > 
‘poo bcouoaG 


a) 

st <t in in es 
o| oO o|o o|o 

N| 0 09 | 00 Re) 

on [ESS Syl |/oy = 


0.70 
1.00 
0.91 
0.68 


First Human Scorer vs. Automated Score 


First vs. Second Human Scorers 


Yes 
Yes 
Yes 


JUdWAIaaISY 19eXA 


43572 | Mathea | 532 |_14 | 0.41 | 0.49 | 0| 4| 0.41 | 0.49 | 0 | 1] 0.24 | 0.43 | 0 1] 95° 


uoielAag psepuers 


Automated Score 


eee eat 


po 
| o 
ES 
| o 
| o 
Lo 
| 0 
re 


uolelAag psepuers 


0.37 | 0 
0.14 

| 0.00_ 
0.82 
0.57 
0.73 

| 0.60 | 

| 0.66 


Second Human 


cl 


02 
/0.00_ 
0.67 
0.41 

3 
0.53 


Pl 0. 
a 07 
2] 0. 
| 0.4 


eet Meher | 208 | 25 Peace 02: | oa: | 01 1 ee 
pee Mato? | sen | 2s PR eee 4 [0st | 0 | 
43557 | Mathe7 | 189 | 20 | 0.32 | 0.47 | 0| 4| 0.31 | 0.46 | 0 | a 

43639 | Mathe7__| 186 | 19 | 0.37 | oa | 0| 1 0.36 | 0.48 


raassa | wath cri | sa | 28) oos| 019 [0] 0 


uoielAag piepueys 


First Human 
0.80 


jUNOD pom - Ueda 


Math G4 
Read G4 


43564 | MathG4 | 156 | 10 | 0.02 | 0.08 | o| 1| 0.01 | 0.08 | 0 | 1| 0.01 | 0.08 | 0| 1| 100 
fo 


raass2 | watncis_| 24 | 2 | 000 | 0.00 | o| 0 


25 


0.27 
| 0.69_ 
0.58 


0.69 
0.89 
0.82 


90 | 0.40 


| 38 


_ ao] | es} oar | 068 


97 | 82 | ves | _98 | 0.83 | 0.92 | 0.83 


0.67 
0.83 
0.54 
0 

0.87 


70 
Lm 
s4| 70] 
88 


0.48 | 0.73 


41 


Reading G7 


188 
ReadingGi1| 60| 37 | 0.58 | 0.67 


raa4i | writing cit 55 | 66 [116 | 074 


43422 


26 


Conclusions 
Essay (Black Box) Model 


The primary focus of the Small Scale Trials was on the development and application of two 
automated scoring models, the essay (black box) scoring model and the propositional (glass box) 
scoring model. The essay (black box) scoring model was applied separately to scoring rubrics for 
three domains (organization, elaboration, and conventions). Separate scoring models were 
developed for each. 


The organization and elaboration models performed as well as any automated scoring models 
presently available. The means and standard deviations were very similar between the human and 
automated scores. Agreement and correlational measures tended to be slightly lower for the 
human/automated values than for the human/human indices for these measures, but this is not 
unusual for automated essay scoring models. 


For the conventions rubric, the results were less consistent, particularly for the fourth grade items. 
There was less agreement in the means, and the standard deviations for the automated model were 
uniformly smaller. Agreement and correlational measures were substantially lower for the 
human/automated comparisons than for the human/human comparisons. The automated model 
score distributions tended to be more peaked than the human distributions. There were fewer 
extreme scores assigned by the automated model (regression effect). The exact reason for this is 
unclear at the moment. The conventions rubric was shorter (3 points) versus the longer score scale 
for the organization and elaboration rubrics (5 points). The distributions clearly show that the 
automated model was conservative, tending to produce the middle score. These factors could have 
contributed to the poor performance with the conventions rubric. 


The inter-rater reliability between the human to human scores for the conventions domain was lower 
than expected. This issue is currently being researched. 


Propositional (Glass Box) Model 


The propositional scoring model performed well enough to meet the validity standard for 50% of the 
constructed-response items across all grades and subjects. 


Mathematics 


The scoring engine met the validity standard for 67% of the items, though the means for the 
automated model tended to be lower than those for the human produced scores. The standard 
deviations were very similar to those obtained by human scoring, only slightly smaller. Agreement 
and correlational measures for the human/automated relationship tended to range from the same 
as or slightly lower than to substantially lower than the human/human values for these indices. 


Reading 


The reading responses were about the same length across grades. In one-third of all the items, the 
engine’s performance met the validity standard. This low proportion may be partly due to the items 
themselves because even the exact agreements between human scorers in most ‘not-met-standard’ 


zy 


cases were relatively much lower. Agreement and correlational measures tended to be lower for the 
human/automated score comparison than for the human/human score comparison. 


Writing 


The writing items were limited in number, with only one item per grade. The engine’s performance 
met the validity standard for two of the three items. The grade 7 item showed not only good exact 
agreement but comparable means between the human and automated scores. The standard 
deviations for the automated model were smaller than or similar to those provided by the human 
raters. The agreement and correlational measures were lower for the human/automated score 
comparison than for the human/human score comparison, particularly for the grade 4 and 11 items. 


Additional work is needed to better understand and improve the propositional model. Models that 
are content-based and try to tap into the semantics of an item are a larger challenge than the black 
box model that relies on structural components in an essay or long writing item. The responses tend 
to be shorter, often only two or three sentences, which limits the information available for model 
development. The Small Scale Trial study is only the first look at the application of these types of 
models. The information learned here will be applied to the pilot and field test items in an effort to 
improve the scoring of these types of items. 


Automated Scoring Support 


The automated scoring software has been developed to work as well as any engine currently 
available from any source. This level of assurance does not mean that all performance tasks can be 
scored using automated models. Efforts will continue to be made to expand this capability in order to 
reach the Consortium goal of providing authentic assessments that are scored in real time. 


The pilot test will provide information about which types of items can currently be scored 
automatically and which types cannot. It should also yield information about how consistent a 
Specific scoring model is in scoring. Decisions will have to be made as to whether an automated 
scoring model is sufficient to provide the final score or if Some type of human scoring support is 
desired. 


There is often interest in providing a “safety net” when automated scoring models are used for 
operational assessments. One answer is to double-score all student responses, first with an 
automated model, often providing a score in the field, and then, soon after, with a single human 
score provided through a distributed scoring network as a check. If differences are found between 
the two scores, the human score can become the official score or the response can be routed to a 
master grader to provide the official score. This effectively cuts the cost and time of a complete 
human double-scoring and allows preliminary scores to be reported in the field. It also provides 
added support to a single human scoring model because two scores are generated. 


A further refinement to this model is to limit the number of cases referred to human scoring by using 
prediction models to identify cases where the score provided by the automated scoring engine is not 
in keeping with the student’s performance on the rest of the test. Cases where the score provided by 
the automated scoring model falls outside of some tolerance band based on a predicted item score 
would be referred to a human for score verification. However, cases where the automated model 
score is in keeping with the predicted score based on the student’s performance on the remainder of 
the test would not be referred for human scoring. This process can be further refined to refer cases 


28 


for human scoring only if a possible change in score would change the classification status of the 
student. 


Another means to support the scoring system is to route cases to human scorers only if there are 
Significant “person fit” issues. Person-fit flagging occurs when, for example, a person answers 
difficult items correctly but answers easy items incorrectly. 


Field Test/Operational Scoring Plan 


Given the present state of the art of automated scoring, it is anticipated that the operational scoring 
model should use automated scoring of some sort for most items, but selectively target those 
responses that were most tenuously scored to human scorers. We expect that many constructed 
response item types will be scoreable with sufficient confidence that targeted human backreads will 
prove unnecessary. In particular, equation response and graphic response items are likely to score 
with sufficient confidence. 


Textual responses are the most challenging to score, using either black-box or glass-box engines. 
This challenge arises in part from the diversity of language use, and in part from the less explicit, 
objective criteria for correctness. Essay length responses can often be scored as accurately by 
scoring engines as by human readers, at least for some types of rubrics; however, as mentioned 
above, the approach fails in important cases. Shorter responses that have explicitly enumerated 
correct answers can often be scored accurately, though generally not quite as accurately as by well- 
trained human readers. Automated scoring approaches tend to perform less well on short texts 
scored on less explicit rubrics. 


lt may be possible for the Consortium to develop improved engines for scoring brief writes. The 
approach might include integrating black-box and glass-box approaches, while at the same time 
making the rubrics more item specific and explicit. 


With these text-processing engines in hand, the Consortium can begin to craft a specific approach to 
scoring. To support any kind of performance task with a black box or glass box model, it is 
recommended that a process of automatically identifying “Suspicious responses” be put in place. By 
“Suspicious,” we mean responses that are more likely to have been mis-scored by the automated 
engine. These cases, as well as other random cases, periodically sampled, should be routed to 
human scores for back-reading. 


29 


References 


Attali, Y., Powers, D., Freedman, M., Harrison, M., & Obetz, S. (2008). Automated scoring of short- 
answer open-ended GRE subject test items. ETS GRE Board Research Report No. 04-02, 
ETS RR-O8-20. Princeton, NJ: ETS. 


Bennett, R. E., Steffen, M., Singley, M. K., Morley, M., & Jacquemin, D. (1997). Evaluating an 
automatically scorable, open-ended response type for measuring mathematical reasoning 
in computer-adaptive tests. Journal of Educational Measurement, 34(2), 162-176. 


Dorans, N. J., & Schmitt, A. P. (1991). Constructed-response and differential item functioning: A 
pragmatic approach (ETS Research Report No. 91-47). Princeton, NJ: Educational Testing 


Service. [Also appears in Construction vs. Choice in Cognitive Measurement (pp. 135-166), R. 


Bennett & W. C. Ward (Eds.), 1993, Hillsdale, NJ: Erlobaum.] 


Holland, P. W. (1985). On the study of differential item difficulty without IRT. Proceedings of the 
Military Testing Association. 


Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel 
procedure. In H. Wainer & H. |. Braun (Eds.), Test validity. Hillsdale, NJ: Erloaum. 


Klein, S. P. (2008). Characteristics of hand and machine-assigned scores to college students’ 
answers to open-ended tasks. In D. Nolan & T. Speed (Eds.), Probability and statistics: 
Essays in honor of David A. Freedman, Vol. 2 (pp. 76-89). Beachwood, OH: Institute of 
Mathematical Statistics. 


Williamson, D. M., Bejar, I. |., & Hone, A. S. (1999). “Mental model” comparison of automated and 
human scoring. Journal of Educational Measurement, 36(2), 158-184. 


Yang, Y., Buckendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A review of strategies for 
validating computer-automated scoring. Applied Measurement in Education, 15(4), 391- 
412. 


Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessment of Differential Item Functioning for 
Performance Tasks. Journal of Educational Measurement, 30(3), 233-251. 


30 


Appendix A 


WM 


AMERICAN INSTITUTES FOR RESEARCH ® 


TO: Smarter Balanced Assessment Consortium 
FROM: AIR Technical Team 
RE: Sampling Plan for SBAC 2012 Small-Scale Trials 


DATE: August 29, 2012 


This memo describes the sampling plan for the SBAC small scale trials. If you have questions please 
contact Gary W. Phillips by email at gwphillips@air.org or by phone at (202) 403-6916. 


Due to tight schedules and budget considerations the 2012 SBAC small-scale trials will be scaled 
back. It has been decided by SBAC that most of the planned research associated with the small- 
scale trials should move to the pilot test component. However, one piece that cannot be moved is 
the piece that informs the automated scoring strategy. 


Much of the research that had been planned for the small-scale trials was designed to generate 
evidence of the construct validity of the test and the impact of accommodations on the construct 
validity. The likely outcome of this research, if validity problems are identified, would be to change 
the presentation of items, eliminate certain classes of items from the pool, or alter the way that 
accommodations are implemented. 


However, information from the pilot test will begin to become available in May, 2013 (about halfway 
through the expected item development for the field test), and information about the automated 
scoring will not become available until August. Given that content and fairness committee meetings 
are usually held in the summer and that two-thirds of the test development period will have passed 
by then, it is probably too late for an overhaul of the scoring rubrics and machine-scored constructed- 
response items. If these data can flow from the trials, they begin to become available in May, about 
halfway through the field-test development period. While this schedule is still tight and risks some 
rework, getting automated scoring information from the small-scale trials offers the best chance for 
SUCCESS. 


The small-scale trial will be limited to grades 4, 7 and 11. Within each grade, a separate 15-item test 
consisting of selected-response and well-known technology enhanced (TE) item types will be 
constructed for reading and math. These selected-response item sets will provide a way to evaluate 
the constructed-response score correlations with the overall test. These data will also be used in 
strategies to detect anomalous responses. 


31 


Forms Design 


AIR will create 15 selected-response (SR) and three constructed-response (CR) items (intended for 
machine-scoring) in each content area (reading, writing and math) at each grade level (grades 4, 7 
and 11). Each participating student would take one of the following forms at the appropriate grade 
level: 


e Form 1: Reading - 15 SR and 3CR 

e Form 2: Writing 
o 15SR(9 reading SR and 6 writing SR) 
o. Brief writing CR A, writing prompt X 

e Form 3: Writing 
o 15SR(9 reading SR and 6 writing SR) 
o. Brief writing CR B, writing prompt Y 

e Form 4: Writing 
o 15SR(9 reading SR and 6 writing SR) 
o Research CRC, writing prompt Z 

e Form 5: Mathematics - 15 SR and 3CR 


Sampling Design 


The current plan is to administer the small-scale trial in October 2012. AIR will draw a two-stage 
Stratified random sample in grades 4, 7 and 11. Three independent samples will be drawn, one for 
each grade. The first stage will be a sample of schools that is representative of 25 SBAC states. 
Within each state a random sample of schools will be obtained that is proportional to the number of 
schools within the state. We will sample a 20% overage in the number of schools to help 
compensate for school non-response. Furthermore, within each state, schools will be implicitly 
Stratified by (1) urbanicity (urban, Suburban or rural), (2) school size (Small, medium or large), (3) 
socio-economic status (low, medium or high free/reduced lunch), and (4) race/ethnicity (primarily 
white, black, Hispanic, other). The second stage will be a random sample of one classroom within 
each selected school. 


e All students within the selected classroom will be tested. 
e All five forms will be spiraled within each classroom (randomly distributed) 
e Aselected school may be tested in only one grade. 


Each selected school will randomly select one classroom. This will be done by first alohabetizing the 
classrooms within the school by teacher’s first name for a given grade. The school will select the 
middle classroom on the list if there are an odd number of classrooms. The school will select the 
classroom just below the middle if there is an even number of classrooms. 


a7 


Table 1 shows the number of schools in SBAC sampling frame. The frame is based on the Public 
Elementary/Secondary Universe Survey Data, which is part of the 2009-2010 Common Core Data 
(CCD) provided by the National Center for Education Statistics (NCES). 

Table 1. Number of Schools in the SBAC Small-Scale Trials Sampling Frame 


School Population 


State | Grade 4 | Grade 7 | Grade 11 
5,642 2687 2,123 


1,/62 
1,107 
1,332 


1,708 


ee ee 


VT 214 123 61 


WA 1,102 497 422 


WV 419 200 146 


WI 1,105 618 553 


me 
20,948 | 11,022 | 9,064 


Explicit Stratification 


States will be used for explicit stratification. In explicit stratification the population frame is divided 
into mutually exclusive strata, and then a sample is drawn within each stratum. In this case the 
population frame is explicitly stratified by state. Then within each state the population of schools will 
be implicitly stratified. This is obtained by sorting schools according to the implicit stratification 
variables, then using systematic sampling to select a simple random sample of schools from the 
ordered list. The number of schools selected per state is indicated in Table 2. The sampling fraction 
is the proportion of schools sampled from the population of schools. The sampling fraction is used to 
sample enough schools to ultimate provide the student sample size required in the SBAC scope of 
work (about 15,000 students). We assume that about 80% of the selected schools will actually 
participate. 


Table 2. Smarter Balanced School Sample 


Selected School Sample Target School Sample 


State Grade 4 Grade?’ Gradeii Grade4 Grade7/ Grade 11 


AL a 
CA 85 78 64 68 62 51 
CT 9 8 7 7 6 6 
DE 2 1 1 2 1 1 
HI 3 2 2 2 2 2 


34 


IA 10 11 12 8 9 10 

KS 11 12 10 9 10 8 

ME 4 4 3 3 3 2 

MI 26 28 29 21 DD 23 

MO fey 20 17 14 16 14 

MT 5 7 5 4 6 4 

NV 5 4 3 4 3 2 

NH 4 4 3 3 3 2 

NC 20 18 16 16 14 13 

ND 4 5 5 3 4 4 

OR 11 11 9 9 9 7 

PA 26 a7 21 21 mp. 17 

SC 9 9 6 7 7 5 

SD 4 6 5 3 5 4 

VT 3 4 2 2 3 2 
WA 17 &2°14 °&=13 0 ©6144 0C~—<C 10 

WV 6 6 4 5 5 3 

WI 17 18 7 14 14 14 

WY 3 3 3 2 2 2 


Total 317 320 214 254 255 220 


Total > 911 Total > 129 


Sampling Fractions Participation Rate 
Grade4 0.015 Grade4 0.80 
Grade / 0.029 Grade / 0.80 
Grade 11 0.030 Grade it 0.80 


Implicit Stratification 


This above sample of schools will be proportionally allocated across the implicit strata. This is 
referred to as systematic simple random sampling. Let’s use California to illustrate systematic 
simple random sampling. In California we have M = 5,642 schools at grade 4 and we want to sample 
m = 85 schools. In this case we would sort the M schools in California by (1) urbanicity (urban, 
Suburban or rural), (2) school size (small, medium or large), (3) socio-economic status (low, medium 
or high free/reduced lunch), and (4) race/ethnicity (primarily white, black, Hispanic, other). Then with 


a random start k between 1 and / /mwe systematically sample using a sampling interval = 4 /m. 
The sorting is done in serpentine order. For example, sort the schools using urbanicity in ascending 
order. Then within the first level of the urbanicity (urban), sort school size in ascending order. Within 
the second level of the urbanicity (Suburban), sort the school size in descending order. In this way we 
sort the school size variable to alternate between ascending and descending sorting throughout all 
levels of the urbanicity. We do the same for free/reduced lunch by sorting the lunch variable within 
levels formed from the first two variables, again alternating between ascending and descending 
order. We do the same for the race/ethnicity variable. This sorting algorithm minimizes the change 
from one school to the next with respect to the sorting variables to make nearby schools more 
similar. After sorting, we then pick a random starting school and increase by the sampling interval to 
select the next sampled school. Between different states, we change the sorting order. For example, 
for the first state, the urbanicity is sorted in ascending order, then for the second state in descending 
order, etc. Then we do serpentine sorting for the other variables. The number of students to be 
Sampled is shown in Table 3. We assume that about 95% of the selected students will participate. 


36 


Table 3. Smarter Balanced Student Sample 


SY=](<Tei K=1o oye 0 Le (-lalmey-]aale) (= B=] ¢2X>)meoya0(o(-Yalmey-laale) (= 


State Grade 4 Grade 7 Grade11 Grade4 Grade?’ Grade 11 
AL 189 231 225 180 219 214 


CA 1,428 1,302 1,275 1,357 1,237 1,211 


CT 147 126 150 140 120 143 
DE 42 21 25 40 20 24 
HI 42 42 50 40 40 48 
ID 84 105 125 80 100 119 
IA 168 189 250 160 180 238 
KS 189 210 200 180 200 190 
ME 63 63 50 60 60 48 
MI 441 462 575 419 439 546 
MO 294 336 350 219 319 333 
MT 84 126 100 80 120 95 
NV 84 63 50 80 60 48 
NH 63 63 50 60 60 48 
NC 336 294 325 319 219 309 
ND 63 84 100 60 80 95 
OR 189 189 175 180 180 166 
PA 441 462 425 419 439 404 
SC 147 147 125 140 140 119 
SD 63 105 100 60 100 95 


VT 42 63 50 40 60 48 


WA 294 29: 250 219 219 238 


WV 105 105 15 100 100 11 
WI 294 294 350 219 219 333 
WY 42 42 50 40 40 48 


Total 5,334 5,355 5,500 5,071 5,090 5,231 


Total > 16,189 Total > 15,392 
Average Class Size Participation Rate 
Grade 4 21 Grade 4 0.95 
Grade 7 21 Grade 7 0.95 
Grade 11 25 Gradeit 0.95 


Design Effects 


In asimple random sample, the information provided by any student within the sample is 
independent of the information provided by other students in the sample, because there is no 
dependency in the selection of students. In most educational contexts, however, student samples 
are clustered, with students within a cluster being more similar than would be expected from a 
random draw of students from the population. Students within a school tend to be more similar to 
one another than a random sample of students from across the state would be, due to a variety of 
factors that pull for similarities among people, including common geographic and socioeconomic 
factors. This clustering dramatically reduces the efficiency of most samples drawn in educational 
contexts and greatly increases the sampling error of estimates derived from these samples. The 
impact of the sampling procedure on the standard errors is referred to as the design effect. 


Kish (1965) popularized the concept of design effects. Design effects are described in more detail by 
Cochran (1977), Levy (1999) and Lohr (1999). The traditional measure of the effect of the sample 
design is how much information is provided from the existing design relative to a Sample based ona 
simple random sample. This is measured by the design effect—the ratio of the variance of a statistic 
that takes the characteristics of the sample into account over the variance of the same statistic 
based on a simple random sample of the same size—that is, the same number of individuals 
selected at random from the entire state population of students in the same grade without regard to 
school. In the case of sampling schools and then testing all students within the selected schools, as 
described here, the design effect is always greater than 1, meaning that, in terms of statistical 
efficiency, such a sample will always provide a larger standard error for any statistic than the 
standard error that could be provided from a simple random sample of the same size. 


AIR recommends a two-stage sample design which first randomly selects schools and then randomly 
selects a classroom within schools. All students within the selected classroom would be tested. The 


formula for the two-stage design effect is Deff =1+(7, -1)P¢+N-(E, —1) ps5, where 7, = the 
average number of students per school per class per form, c, is the average number of classes per 


school per form, 9, is the intra-class correlation within classes, and p, is the intra-class correlation 


within schools. For example, the design effect for Grade 4 below can be calculated as follows. We 
assume there will be an average of 21 students per class with one class per school and 5 forms 
administered within each class. This yields an average of 4.2 students per school per class per form 


(n. ). The average number of classes per school per form is equal to (ce, ) . Therefore, the design 
effect is equal to 1+ (4.2-1).15 + 4.2(1-1).10 = 1.48. 

The effective sample size is the actual sample size divided by the design effect. This number 
indicates the size of a simple random sample that would have the same statistical standard error as 
that produced by the actual stand-alone field test sample when the sample design is taken into 
account. Table 4 shows the anticipated design effects and effective sample sizes. The class size 
estimates in Table 3 and Table 4 are based on national estimates provided by NCES (Average Class 
Size for Public School Teachers in Elementary Schools, Secondary Schools, and Schools with 


Combined Grades, by Classroom Type [Table 8], U.S. Department of Education, National Center for 
Education Statistics, Schools and Staffing Survey [SASS], 2007-2008). 


Table 4. Design Effects in the Smarter Balanced Small-Scale Trials 


Class Tei Tele) Target # Target # Student Effective 
Intra-class Intra-class Number Responding Responding Design Sys] 08] 0) (> amy 1001 8) (= 


Class Size Correlation Correlation Forms Schools Students Effect per Form per Form 


Power Analysis 


Power is the probability that you will be able to detect the effect you are looking for in the data. 
Power analyses strategies are described in Cohen (1969 & 1988). The effective sample sizes in 
Table 4 can be used to conduct a power analysis for the small-scale trials. Let’s assume that we wish 


to compare the p-values of items under two conditions: hand-scoring (p, ) and automated scoring 


39 


(p, ). We want to have a sample large enough to have a .80 probability (1— /) of detecting a 
minimally detectable effect (MDE) of 6 =.20. Effect size estimates for p-values (with common n) can 
be obtained by 6 = 2aresin (fp, )—2aresin( fp, . Are the planned SBAC sample sizes large 


enough to meet these criteria if we use a two-tailed z-test with a = .05 ? Table 5 shows that the 
planned sample design would meet and exceed these criteria. For example, for grade 4 we plan to 
obtain an effective sample size equal to 678 students. This is enough students to detect an effect 
size of .15, which exceeds our expectation of being able to detect an effect size equal to .20. We 
obtain similar results for grades 7 and 11. 


Table 5. Power Analysis of the Smarter Balanced Small-Scale Trials 
Minimal Actual 


DY=\K=\errs] 0) (=) Number’ Effective Detectable 


Standardized A Priori Tails Sample Standardized 


Effect Size mele) f ANT e) are) Z-Test perForm’§ Effect Size 


Grade 4 0.2 0.8 0.05 2 685 0.15 

Grade 7 0.2 0.8 0.05 2 688 0.15 

Grade 11 0.2 0.8 0.05 2 654 0.15 
Student Sampling Weights 


School weight: The schools will be selected with a systematic simple random sample. If m, is the 
number of schools to be selected from stratum h, then the probability of the ith school within stratum 


m 
h being selected is p,, = vA where /, is the total number of schools in stratum h. The school 
h 


weight is w,,=—. 
hi 


Class weight: One class will be selected within the selected school, so the selection probability of 


class j in school i from stratum h is p,,, =1/J,,, where J,,is the number of classes in stratum h and 


school 7. The class weight is w,,, =—— 
hij 


40 


Student weight: Finally, all students are selected from each selected class so that the selection 
probability of student k is p,,,, =1. The overall selection probability for student k is D, = Da; Pri Phiik- 


l 
The overall weight can be calculated as w, = —. 
Px 
Normalized weight: The weight is then normalized within each stratum to the total number of 
sampled students. More specifically, suppose that there are total of NV, students in stratum h and 


the weighted sample size for stratum h isW, = >, . Then, the normalized weight for sampled 
keh 


oe AN, 
student k in stratum h is w, = a W,. 
h 


Results of School Sampling 


At this time AIR has drawn the sample of schools as outlined above. AIR used SAS PROC SURVEY 
SELECT. The characteristics of the sampling frame compared to the characteristics of the sample are 
contained in Table 6 - Table 9. Due to rounding in the cells some of the marginal totals involving 
percentages may not sum to 100%. 


In general, the match is very good. This indicates the SBAC sample is representative of the 
population. 


4] 


Table 6. Stratification based on Urbanicity 


Sampling Frame Smarter Balanced Sample 


Urbanicity #Schools Percent # Schools Percent 


Urban 5925 28 89 28 
Grade 4 Suburb 5686 2/ 84 26 
Rural 9337 45 144 45 

Total 20948 100 317 100 


Urbanicity #Schools Percent # Schools Percent 


Urban 2579 23 14 23 
Grade 7 Suburb 2231 20 69 22 
Rural 6212 56 177 55 

Total 11022 100 320 100 


Urbanicity #Schools Percent # Schools Percent 


Urban 1899 21 57 a8 
Suburb 1807 20 56 20 

Rural 5358 59 161 59 

Total 9064 100 274 100 


42 


Table 7. Stratification based on School Size 


Sampling Frame Smarter Balanced Sample 


School Size # Schools Percent 


small 1093 
Grade 4 medium 6879 
large 6976 


Total 20948 


34 


33 


33 


100 


School Size # Schools Percent 


small 3645 
Grade 7 medium 3712 
large 3665 


Total 11022 


33 


34 


33 


100 


School Size # Schools Percent 


small 3007 

Grade 
11 medium 3034 
large 3023 
Total 9064 


33 


33 


33 


100 


# Schools Percent 


107 34 
106 33 
104 33 
317 100 


# Schools Percent 


106 33 
109 34 
105 33 
320 100 


# Schools Percent 


90 33 

93 34 

91 33 
2/4 100 


43 


Table 8. Stratification based on Socio-Economic Status 


Grade 4 medium 


Grade 7 medium 


Grade 


11 


SES 


low 


high 


Total 


SES 


low 


high 


Total 


SES 


low 


medium 


high 


Total 


Sampling Frame 


6983 


6981 


6984 


20948 


3675 


3673 


3674 


11022 


3022 


3021 


3021 


9064 


# Schools Percent 


33 


33 


33 


100 


# Schools Percent 


33 


33 


33 


100 


# Schools Percent 


33 


33 


33 


100 


# Schools 


108 


104 


105 


317 


# Schools 


108 


105 


107 


320 


# Schools 


93 


S/ 


94 


214 


Smarter Balanced Sample 


Percent 


34 


33 


33 


100 


Percent 


34 


33 


33 


100 


Percent 


34 


32 


34 


100 


44 


Table 9. Stratification based on Ethnicity 


Sampling Frame Smarter Balanced Sample 


Ethnicity #Schools Percent # Schools Percent 


White 13963 67 213 67 
Black 2118 10 29 9 
Grade 4 
Hispanic 38/74 18 60 19 
Other 993 5 15 5 
Total 20948 100 317 100 
Ethnicity #Schools Percent # Schools Percent 
White 1577 69 222 69 
Black 1291 12 37 12 
Grade / 
Hispanic 1680 a5 51 16 
Other 474 4 10 3 
Total 11022 100 320 100 
Ethnicity #Schools Percent # Schools Percent 
White 6493 12 195 1 
Grade Black 916 10 27 10 
11 Hispanic 1331 15 42 15 
Other 324 4 10 4 


Total 9064 100 214 100 


References 


Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: John Wiley and Sons. 


Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press. 


Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: 
Lawrence Erlbaum Associates. 


Kish, L. (1965). Survey Sampling. New York: John Wiley and Sons. 


Levy, P. S., & Lemeshow, S. (1999). Sampling of populations: Methods and applications (3rd ed.). 
New York: John Wiley and Sons. 


Lohr, S. L. (1999). Sampling: Design and analysis. Pacific Grove, CA: Duxbury Press, Brooks/Cole 
Publishing. 


46 


Appendix B: Small Scale Trials Item Prompts and Scoring Rubrics 


Item 
NTUTaalereye 


ltem Description 


43173 | The student must explain the process for how both schools can find the least number of tables needed. 


Prompt: Describe a strategy using words that both schools could use to determine the least number of 
tables required. 


Type your answer in the space provided. 


2 Point Text: The response addresses the task in a satisfactory manner. It is complete and accurate, 
containing enough information (general or specific) to answer the question thoroughly. 


For this item, the response includes both of the correct strategies: 


1st Strategy 

e Divide the number of students at each school by 6. 

OR 

e Repeated subtraction of 6 from the number of students. 


2nd Strategy 


Prompt 
and e Since the number of tables purchased must be a whole number, round to the next whole number to 
Scoring | ensure that all students have a place to sit. 


Rubric 
for Item | 2 Point Text: The response addresses the task in a partially satisfactory manner. It is partially complete, 
A3173 | containing enough information (general or specific) to answer part of the question. 


For this item, the response includes one of the following correct strategies: 

e Divide the number of students at each school by 6. 

e Repeated subtraction of 6 from the number of students. 

e Since the number of tables purchased must be a whole number, round to the next whole number to 
ensure that all students have a place to sit. 

e Use 13 tables and 6 tables. 

e Use a total of 19 tables. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly no other information is 
shown. The student may have written on a different topic, or written “] don’t know.” 


13248 The student must explain how the author defines "art" and support the definition with a detail from the 
text. 


47 


Prompt 
and 
Scoring 
Rubric 
for Item 
43248 


Prompt 
and 
Scoring 
Rubric 
for Item 
43280 


Prompt: How would the author of this passage define art? Use details from the passage to support your 
answer. 


Correct Responses: The author would probably think that anything can count as art depending on the 
individual’s view. This is shown in the text where Clara sees a tablecloth and a spider web as art. 


2 Point Text: The response includes at least one statement from each category: 
Author's Definition: 
@® Artcan be viewed in many different ways. 
® Artis interpreted by the individual / interpreted differently. 
@® Many things count as art / art is in the eye of the beholder. 
@® Everyday/ ordinary objects can be art. 
® Anything with complexity / craftsmanship can be art. 
Details: 
@® Clara views geometry as art / there is a relationship between geometry and art. 
@® Clara views the tablecloth as art. 
@® Clara sees the spider web as art / calls the spider an artist. 


1 Point Text: The response includes at least one statement from only one of the above categories. 
Sample answer: The author would probably think that art can be viewed in many different ways. 


O Point Text: The response does not meet the criteria required to earn one point. The response indicates 
inadequate or no understanding of the task and/or the idea or concept needed to answer the item. It 
may only repeat information given in the test item. The response may provide an incorrect 
solution/response and the provided supportive information may be irrelevant to the item, or possibly, no 
other information is shown. The student may have written on a different topic or written, “I don't know.” 
Sample answer: N/A 


The student must explain why a game's rules are important to the characters in both texts. A full-credit 
answer must reference both texts. 


Prompt: Student Directions for Parts 1 and 2 
Part 1 (35 Minutes) 


Your task: You will read four sources: two articles and two stories. Then you will answer a question 
about what you learned. In Part 2, you will write a story about Someone involved in or watching a game, 
event, or sport. 


Steps to Follow: In order to plan and write your story, you will do all of the following: 
1) Examine several sources 
2) Answer a question about sources 


Directions for Beginning: You will now examine several sources. You can re-examine any of the sources 
as often as you like. 


Research Question: 


48 


After examining the sources, use the remaining time in Part 1 to answer a question about them. Your 
answer to the question will be scored. Also, your answer will help you think about the sources you have 
read and viewed, which should help you write your story. 


The authors of Documents #3 The Invention of Kickball and #4 Casey's First Match write stories about 
games. The rules of each game are a central idea in both stories. How are the rules of each game 
important to Jacob in the story of the invention of kickball (Document #3) and to Casey in the story of the 
math competition (Document #4)? 


Use details from each story to Support your answer. 


Correct Answer: The rules are important to Jacob in his story because he needs to explain them to his 
uncle. The rules are important to Casey in her story because she wants to win a competition. 


2 Point Text: 
Response includes a correct way the rules of kickball are important to Jacob with a correct way the rules 
of a mathlete competition are important to Casey. 


Jacob: 
e He needs to explain them. 
e He tells them to his uncle. 
e He shows the game to his uncle. 
e Heand his friends invented the game. 
e He uses them to help his uncle. 


e Itis her first time in a Math/Mathlete competiton. 

e She doesn’t know the rules of the Mathlete competition. 

e lf she doesn’t know the rules, she wouldn’t know how to play. 

e Sheis ona Mathelete team. 

e She wants to win/help her team. 

e She has to solve Math problems in a competition. 

e She sits on the stage/waits for her turn to solve Math problems. 


1 Point Text: 

Response includes a correct way the rules of kickball are important to Jacob but a missing or incorrect 
way the rules of a Mathlete competition are important to Casey. 

OR 

Response includes a correct way the rules of a mathlete competition are important to Casey but a 
missing or incorrect way the rules of kickball are important to Jacob. 

Sample Answer: The rules are important to Jacob in his story because he needs to explain them to his 
uncle. 


O Point Text: 


The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 


49 


repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “| don't know.” 

Sample Answer: N/A 


The student must explain how the structure of the text sets up the ending. A specific detail illustrating 
43297 | the setup or resolution is required for full credit. 


Prompt: How does the structure of this text prepare the reader for the ending? Support your response 
with evidence from the text. 


Type your answer in the space provided. 

Correct Responses: The structure prepares the reader for the ending using dialogue between the pieces 
of the clock to show why the clock stopped in the first place. This sets up the ending where the farmer 
thinks his watch has gained 30 minutes overnight because it doesn’t match the clock. 

2 Point Text: 

Response includes one correct explanation about how the structure of the text prepares the reader for 


the ending with a supporting detail. 


How the structure prepares the reader for the ending: 


e The text includes dialogue/the parts of the clock argue with each other. 
e The text uses cause and effect to explain the problem. 
Prompt e The reader learns why the clock has stopped. 
and e The pacing of the story suggests that the problem will be solved by the end. 
Scoring e The text sets up a conflict and a resolution. 
Rubric e The text uses foreshadowing. 
for Item 


43297 | Support: 
e The clock stops because the pendulum is tired of ticking. 

The pendulum is the only part that wants to stop working. 

“the pendulum, who spoke thus: ‘I confess myself to be the sole cause of the present stoppage’” 

The dial plate/other pieces of the clock urge the pendulum to work. 

The pieces tell the pendulum how important he is to the clock. 

“You have done a great deal of work in your time; so have we all, and are likely to do; which, 

although it may fatigue us to think of, the question is, whether it will fatigue us to do” 

e “Recollect that, although you may think of a million strokes in an instant, you are required to 
execute but one; and that, however often you may hereafter have to swing, a moment will always 
be given you to swing in.” 

e The farmer’s watch gains half an hour. 

e The farmer thinks his watch is wrong. 


1 Point Text: 
Response includes one correct explanation about how the structure of the text prepares the reader for 
the ending with a missing or incorrect supporting detail. 


50 


Note: The student will NOT receive credit for a correct Supporting detail without a correct structure. 
Sample Answer: The structure tells the reader what the lesson is going to be because the clock parts are 
arguing about it. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 

or no understanding of the task and/or the idea or concept needed to answer the item. It may only 

repeat information given in the test item. The response may provide an incorrect solution/response and 

the provided supportive information may be irrelevant to the item, or possibly, no other information is 

shown. The student may have written on a different topic or written, “I don't know.” 

Sample Answer: N/A 

The student must choose the "best" proof of the Pythagorean Theorem and provide support for the 
43559 | decision. 


Prompt: Why is Attempt 2 the best proof? 


Type your answer in the space provided. 


Correct responses: Attempt number 2 works for any size of right triangle. 
Attempt number 1 only shows that the theorem works for a 3, 4, 5 triangle. 
Attempt number 3 only shows that the theorem works for isosceles right triangles. 


2 Point Text: 
N/A 


1 Point Text: 
The response addresses the task in a satisfactory manner. It is complete and accurate, containing 
enough information (general or specific) to answer the question thoroughly. 


Prompt 
and 
Scoring 


pea For this item, the response includes one of the correct explanations: 


A3559 e Attempt 2 works for any right triangle/algebraically. 
Attempt 1 only works for a 3, 4, 5 triangle and Attempt 3 only works for isosceles right triangles. 
ta Attempt 2 works for any right triangle/algebraically. Attempt 1 only works for a 3, 4, 5 triangle 
and Attempt 3 only works for isosceles right triangles. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


43397 The student must identify a technique the author uses and provide an example from the text. 
Prompt | Prompt: What techniques does the author use to convince the reader of the believability of the events in 
and the text? 


51 


Scoring 
Rubric 
for Item 
4339/7 


43403 


Provide details from the text to Support your answer. 


Correct Responses: The author uses a first-person narrator to tell us the facts about everything that 
happened and to let us know how he feels about Conway and Barting. In the passage, it states that 
Foley trusts Barting because he was an “honorable” and “truthful” man with whom he had served ina 
war, so the reader can believe that the narrator is telling the truth about his story. 


2 Point Text: 
The response includes a correct technique and a correct supporting detail from the text. 


Techniques: 
e Straightforward/casual/conversational tone 
e First-person narrative 
e Descriptions/characterization of Conway/Barting 
e Details about Conway and Barting’s friendship 
e Realistic setting/details about the time period 
e Foreshadowing 


Support: 
e “This is a story told by the late Benson Foley of San Francisco” (only support for first person) 
e “Inthe summer of 1881 | meta man named James H. Conway, a resident of Franklin, 
Tennessee.” 
e “| had known Barting as a captain in the Federal army during the civil war.” 


e “Barting had always seemed to me an honorable and truthful man.” 

e “the warm friendship which he expressed in his note for Mr. Conway was to me sufficient 
evidence that the latter was in every way worthy of my confidence and esteem.” 

e “| had in my pocket a photograph of Barting... without a mustache.” 

e “it had been solemnly agreed between him and Barting that the one who died first should, if 
possible, communicate with the other” (only support for foreshadowing) 


1 Point Text: 

The response includes one of the correct techniques without a correct supporting detail from the text. 
Note: The student will NOT receive credit for a correct Supporting detail without a correct technique. 
Sample Answer: The author makes the story more believable by telling it in the first-person perspective, 
like we are listening to the narrator tell us about his own story. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


The student must write an ending for an unfinished story. Specific details and appropriate word 


choice/vocabulary are required for full credit. 


52 


Prompt 
and 


Scoring 
Rubric 
for Item 
43403 


Prompt: The following two paragraphs are the beginning of a story about trying to have fun on a rainy 
day. Read the paragraphs. After reading the paragraphs, you will finish the story. 


Last Saturday morning, | woke with my alarm at 8 a.m. and leapt out of bed. My softball team was having 
its very first game that day, and | couldn't wait. | had been practicing for weeks, and | knew | was ready. | 
hurried across my bedroom to where my uniform was carefully folded and placed on a chair. It was only 
then that | looked out the window. It was pouring rain! The game would be canceled for sure; | felt ready 
to cry because | was So upset. 


Then, | heard my mom and brother laughing downstairs. The sound made me forget my 
disappointment. Curious, | wandered out of my bedroom, still in my pajamas, and made my way down 
the stairs. 


Now finish the story. 
Type your answer in the space provided. 


Correct Responses: | followed the laughter into the living room. My mom and my brother sat on the floor, 
flipping through the book of old family photographs. | wondered what could possibly be so funny. Then, 
my brother looked up and saw me standing there. 


“Come here, you have to see these!” he said. | glanced out the window; the rain was still pouring down in 
sheets. | thought that if | couldn't play softball today, | might as well try to have fun inside. | sat down next 
to my brother, and he showed me pictures of us as babies and toddlers, playing games and making silly 
faces for the camera. In a few minutes, | was laughing too. Maybe spending the day indoors wouldn't be 
so bad. 


2 Point Text: 

The response: 
® provides appropriate and predominately specific details or evidence 
@® uses appropriate word choices for intended audience and purpose 


1 Point Text: 
The response: 
® provides mostly general details and evidence, but may include extraneous or loosely related 
details 
® hasa limited and predictable vocabulary that may not be consistently appropriate for the 
intended audience and purpose 
Sample Answer: | followed the sounds into the living room. My mom and my brother sat on the floor, 
flipping through the book of old family photographs. What could possibly be so funny? 


“Come here, you have to see these!” he said. | glanced out the window; the rain was still pouring down in 
sheets. | thought that if | couldn't play softball today, | might as well try to have fun inside. | sat down next 
to my brother, and he showed me pictures. In a few minutes, | was laughing too. Maybe spending the 
day indoors wouldn't be so bad. 


53 


Prompt 
and 
Scoring 
Rubric 
for item 
43412 


43416 


O Point Text: 
The response: 
® includes few supporting details that may be vague, repetitive, incorrect, or interfere with the 
meaning of the text 
® has inappropriate vocabulary for the intended audience and purpose 


The student must identify a characteristic of a successful mail delivery service and provide a supporting 
detail from the text. 


Prompt: What does the passage suggest is important for a successful mail delivery service? Support 
your answer with a detail from the passage. 


Type your answer in the space provided. 


Correct Responses: People were looking for a mail delivery service that was always fast. The Pony 
Express did this by sending riders from Missouri to California in just eleven days. 


2 Point Text: 

What was important in a mail delivery service: 
® fast / speed 
@® reliable / reliability / consistency / being on time 
@® trustworthy / trust 


Supporting detail: 
Stations were built ten to fifteen miles apart / riders would change horses and then continue on. 
They promised mail would travel across the country in ten days. 
Riders rode from California to Missouri in eleven days. 
Riders traveled through bad weather. 
“No matter what ... they always delivered the mail.” 
People switched to the telegraph because it was faster. 
1 Point Text: 
The response contains an example of what was important for a mail delivery service. 
OR 
The response contains a correct supporting detail. 
Sample Answer: People wanted a mail delivery service that they could trust all of the time. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


The student must explain how a character feels at the end of the text and support this feeling with a 
detail from the text. 


54 


Prompt: Explain how Alexa most likely feels when her sister Marisa plays piano. Use information from the 
text to Support your explanation. 


Correct Responses: Alexa feels proud of her sister. She helped Marisa practice and gave her advice, so 
she wants to see her do well. 


2 Point Text: 
Response includes a correct feeling and a correct explanation. 


Feeling: 

proud / pride 
excited / excitement 
happy / happiness 
calm / patient 
confident 


6eee86 


Explanation: 

She talks Marisa into joining the talent show. 

She helps Marisa practice. 

She gives Marisa advice. 

She tells Marisa a trick to be less nervous. 

She encourages Marisa. 

She tells Marisa “that was really good” / “you'll be great.” 

She tells Marisa not to worry / says “don't worry” / “you'll be fine.” 
She smiles up at Marisa. 


Prompt 
and 
Scoring 
Rubric 
for item 
43416 


6eee8e68686860 


7 hh 


Note: Vague feelings such as “good,” “great,” or “fine” will NOT receive credit. 
1 Point Text: 
Response includes a correct feeling without a correct explanation. 


Note: The student will NOT receive credit for a correct explanation without a correct feeling. 
Sample Answer: Alexa is excited to see her sister finally play. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
shown. The student may have written on a different topic or written, “I don't know.” 


The student must explain how the author uses an example to support his/her main idea. For full credit, 
the student must identify a different but analogous example from the text. 


Prompt | Prompt: Read this sentence from the text. 
and 
Scoring | “The idea is that when people smell the cookies, they will feel good, and as a result they will want to buy 


55 


Rubric 
for item 
43422 


the house.” 


Explain how this sentence relates to the author's main point. Then, give another example from the 
passage that relates to the author's point in a similar way. 


Type your answer in the space provided. 


Correct Responses: Smell is connected to people's feelings and can make them spend money on things. 
The smell of chlorine brings up pool-related memories. 


2 Point Text: 
The response relates the sentence to the main idea and adequately supports the reasoning with another 
example. Correct answers include: 


Relates to Main Idea: 
® Thesmell connects to memory / emotion / feelings. 
@® Thesmell can make you feel good. 
@® Thesmell makes you feel things. 
@® Thesmell makes you feel a certain way. 
AND 
@® Thesmell gets people to buy things/spend money / act a certain way. 
® Thesmell influences people / decisions. 


Note: Students must explain that smell is both linked to memory / emotions AND that smell influences 
how people act. 


Another Example: 


® Chlorine makes people think of pool-related memories. 
® Babies like garlic if they're introduced to it before birth. 
@® Grocery stores smell like baked goods. 
@® Shoe companies make their shops smell good. 
@© Malodorants repel people. 
@® Police use stink bombs to make people leave / break up riots. 
@® Skunks use smell as a defense. 
1 Point Text: 


The response tells how the sentence relates to the main idea but does not give another example. 


Note: Students will NOT receive credit for a correct example without a correct relation. 
Sample Answer: Smell and emotions are related because smells can influence your feelings and 
decisions. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 


56 


shown. The student may have written on a different topic or written, “I don't know.” 


The student must identify the attitude of persons named in the text and provide evidence of that 
43435 | attitude. 


Prompt: Explain what the text suggests about the McWhirters' attitude toward the public. Provide 
evidence from the text that supports this inference. 


Type your answer in the space provided. 


Correct Responses: The McWhirters show concern for the public by removing records that encourage 
unsafe or immoral behavior. 


2 Point Text: 
Response includes a correct attitude with correct evidence. 


Attitude: 
@® They respected / served / were concerned / cared about the public / understood what the public 
wanted and supplied it. 
@® They were interested in informing / educating the public with facts. 


SOB Evidence: 
pay @ They “scoured the globe” / worked hard to collect their data. 
@® They responded to public demand / thirst for unusual knowledge. 
Scoring i ; ; 
. @® They sought out “increasingly obscure, little-known facts. 
Rubric 
@ They included bizarre / strange / weird / wacky records. 
for Item 
43435 @® Their website covers facts for a wide range of interests. 
@® They invited the public to submit applications for new records. 
@® Records have been removed for ethical / moral / safety reasons. 
1 Point Text: 


Response includes a correct attitude without evidence. 


Note: The student will NOT receive credit for correct evidence without a correct attitude. 
Sample Answer: The McWhirters wanted to serve the public. That's why they put out their book. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


The student must explain how the author's use of chronological order helps provide characterization and 
43445 
support the answer with an example from the text. 


57 


Prompt 
and 


Scoring 
Rubric 
for Item 
43445 


43446 


Prompt: How does the author's use of chronological order in the text highlight Sojourner Truth's personal 
characteristics? 


Use examples from the text to support your answer. 
Type your answer in the space provided. 


Correct Responses: The author shows how determined Sojourner Truth was over the years of her life. 
This is particularly evident when the author mentions that even though she didn't have a formal 
education, she kept standing up and fighting for what was right. 


2 Point Text: 
Response includes the following correct use of chronological order to highlight Sojourner Truth's personal 
characteristics with correct support. 


Use: 
@® It shows how her life and background influenced her. 
@® It shows that she was determined / strong / brave / hardworking throughout her life. 
@ It shows that she never gave up. 
@® It shows the challenges she faced and overcame during her life. 
@ It shows her growth/change over time. 


Support: 

Lack of education did not stop her. 

She continued working into her eighties. 

She went from being a slave to inspiring many people. 
She was the first former slave to win a court case. 
She was one of the African American pioneers. 


6668686 


1 Point Text: 

Response includes a correct use of chronological order to highlight Sojourner Truth’s personal 
characteristics without a correct support. 

Note: The student will NOT receive credit for a correct Support without a correct use. 

Sample Answer: The author uses chronological order to show how determined Sojourner Truth was 
through her entire life. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


The student must write a brief argumentative essay for or against public transportation. A full-credit 


response must include evidence from a provided table and use appropriate vocabulary. 


58 


Prompt 
and 


Scoring 
Rubric 
for item 
43446 


Prompt: Joseph's English teacher assigned each student the task of writing an argumentative essay 
requiring a research component. The following table contains Joseph's notes from his research about the 
merits of using public transit versus driving a private vehicle. Read his notes. Then compose a brief 
argumentative essay. 


Public Transit Private Vehicle 


Saves money in gas, car | Have to sacrifice ls convenient because __|May be difficult to find 
maintenance, and personal space car is only steps away 
repairs 


Saves frustration by Can be difficult to follow |ls more comfortable than|Have to deal with heavy 
avoiding heavy traffic schedule public transit 


ls better for the Can be unreliable Can go anywhere and Can only use time in the 
environment can get there directly vehicle for driving 


Can use time on transit |Lacks the privacy and Is not limited to areas Costs more than public 
productively comfort of a car that public transit serves {transportation 


May need to tolerate Uses more natural 
inconsiderate people resources 


Compose a brief argumentative essay either in support of, or in opposition to, using public 
transportation. Use evidence from the table to support your answer. 


Type your answer in the space provided. 


Correct Responses: Driving a private car is far Superior to the use of public transportation. In many 
Situations, it does not make any sense to take public transit, especially when outside of a big city. Far 
more people in the country use private cars instead of public transportation. This is likely because cars 
are more convenient for most citizens. For instance, while one's car may only be a few steps away, many 
residents live a mile or more from the nearest bus stop. The inconvenience of distance is only made 
worse by the unreliable nature of public transit in general, and buses in particular. 

Furthermore, while some may argue that cars are not worth the extra cost, the time that one saves 
by driving more than makes up for the increased cost. As they say, time is money. Similarly, without a car 
one is limited to the geographical area that public transit serves, whereas a private car allows the driver 
the freedom to travel anywhere he or she desires. And you can't put a price on freedom. 

Though public transit does have its advantages, it is mainly useful for those in heavily serviced 
areas. For the rest of us, cars are the clear choice for their convenience and the freedom they allow. 


2 Point Text: 


59 


Prompt 
and 
Scoring 
Rubric 
for item 
43468 


Evidence/Elaboration: 


The response: 
¢ provides appropriate and predominantly specific details or evidence 
e uses appropriate word choices for intended audience and purpose 


1 Point Text: 
Evidence/Elaboration: 


The response: 
¢ provides mostly general details and evidence, but may include extraneous or loosely related details 
e has a limited and predictable vocabulary that may not be consistently appropriate for the intended 
audience and purpose 
Sample Answer: There are tons of bus stops near my house. However, | like to use the ones that | can 
walk to because it makes it easier to catch the bus to school. | know all of the bus schedules so for me 
the buses are easy to figure out. There's even an application for mobile devices that lets you see when 
the next bus is coming. This can be really useful when the buses are being unreliable like they can be. 

| always use the bus which means that | choose public transportation. Even if | could have a car | 
wouldn't want it. Cars are hard to drive and can be really dangerous. I'd bike instead of driving if | had the 
choice. 

| like the fact that | get to use the bus and train every day. It makes my trip to school interesting and 
enjoyable. Sometimes | even read there. | couldn't do that in a car. | think everyone should use public 
transportation because it's so much better than driving. 


O Point Text: 
Evidence/Elaboration: 


The response: 

e includes few supporting details that may be vague, repetitive, incorrect, or interfere with the meaning 
of the text 

e has inappropriate vocabulary for the intended audience and purpose 

Sample Answer: Public transportation is the best. 


The student must use the provided documents to identify three scientific inaccuracies in a science 
fiction story. 


Prompt: 
Part 1 (35 Minutes) 


You will read several documents about science and science fiction. Then you will answer a question 
about what you have read. In Part 2, you will write a narrative story about living on the moon or traveling 
to the moon. You will use current scientific knowledge to shape the story. 


Steps to Follow 


60 


In order to plan and write your narrative story, you will do the following: 
1. Examine several documents. 
2. Answer a question about the documents. 
Directions for Beginning 
You will now examine several documents. You can re-read the documents as often as you like. 
Research Question 


After examining the research documents, use the remaining time in Part 1 to answer a question. Your 
answer to this question will be scored. Also your answer will help you think about the research 
documents you have read, which will help you write your narrative story. 


Read all the documents. When you are finished reading the documents provided, review the story, 
Document #2: Lost on the Moon. This story was written many years ago and used scientific information 
about the Moon during that time. There are many misunderstandings about the Moon in this science 
fiction story because we have newer scientific information today. Explain at least three points in Lost on 
the Moon that are incorrect based on today's scientific information provided in the other documents 
about the Moon. Use information from the other documents provided to support your answer. 


Type your answer in the space provided. 


2 Point Text: 
A correct response includes at least three of the following points about the Moon that are incorrect 
based on today's scientific information: 


speed of travel to the Moon 

amount of gravity 

ability to grow food / plants / vegetation 
Moon supporting life / people on the Moon 
diamonds on the Moon 

atmosphere / oxygen on the Moon 

water on the Moon 


6eee6866 


1 Point Text: 
Response includes only two points about the Moon that are incorrect based on today's scientific 
information. 


Note: The student will NOT receive credit for a response that includes only one point about the Moon 
that is incorrect based on today's scientific information. 


O Point Text: 
The response does not meet the criteria required to earn one point. The response indicates inadequate 


or no understanding of the task and/or the idea or concept needed to answer the item. It may only 


61 


43491 


Prompt 
and 
Scoring 
Rubric 
for item 
43491 


repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


The student must explain how one document is different from the others. The student must also identify 
evidence the author uses to strengthen his claim. 


Prompt: Explain how the editorial “For Love of the City” is different from the other sources. Then, explain 
how the author strengthens his claim that “cities are beneficial for living.” 


Type your answer in the space provided. 


2 Point Text: 
A full-credit response includes a correct explanation with a correct technique. 


How the editorial is different: 
@® The author is against rural living / suggests problems with rural living. 
@® The author portrays rural living as inferior to living in the city. 
© The text is a persuasive / opinion piece. 
@® The text is a letter to an editor. 
@ The author gives his own opinions. 


How author strengthens his claim: 
© presents points and counterpoints 
© disproves assumptions 
@® cites research / experts 
© links cities to human nature 
@® humans need interaction with other humans 
@® links cities to historical achievements / progress 
® cities are a source of innovation 
© implies environmental / moral / ethical / responsibilities and concerns 
appeals to the reader's sense of responsibility 
rural living uses more fuel / electricity / resources 
rural living creates more pollution 
city living saves money 


666 8e 


1 Point Text: 
Response includes only either an explanation of how the editorial is different or how the author 
strengthens his claim. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


62 


43497 


Prompt 
and 
Scoring 
Rubric 
for item 
43497 


The student must write a conclusion to an argumentative essay while acknowledging listed counter- 
arguments. A full-credit response includes specific details and appropriate word choice. 


Prompt: 

A local newspaper has written articles on restaurants in your neighborhood. Your teacher has asked you 
to write an argumentative essay on a topic concerning restaurants, using the articles as sources. After 
reading the sentences, you will finish the paragraph by including the counter-arguments below. The 
following is the beginning of your argumentative essay considering whether restaurants should offer 
more vegetarian-friendly options. After reading the sentences, finish the paragraph by including the 
counter-arguments below. 


| think that restaurants should offer more options for people who prefer a vegetarian diet. Many 
people these days choose to stay away from meat products. A recent survey found that 38% of people 
are vegetarians. However, there are few items on restaurant menus for vegetarians to choose from, and 
the few options that are available are often much more costly than the meat-based alternatives. 


Now finish the argumentative essay while acknowledging these counter-arguments: 


Counter-Arguments: 
@® Vegetarian products can be expensive. 


® Almost every restaurant offers salads for vegetarians. 
@® There are more people who eat meat than there are vegetarians. 


Type your answer in the space provided. 


Correct Responses: 

| think that restaurants should offer more options for people who prefer a vegetarian diet. Many people 
these days choose to stay away from meat products. However, there are few items on restaurant menus 
for vegetarians to choose from. Most restaurants offer salads but people can get tired of eating a salad 
every day. And even though some vegetarian options can be expensive, | think that if restaurants offered 
these products, more vegetarians would eat there! 


2 Point Text: 

The response: 

¢ provides appropriate and predominately specific details or evidence 
euses appropriate word choices for intended audience and purpose 


1 Point Text: 

The response: 

¢ provides mostly general details and evidence, but may include extraneous or loosely related details 
ehas a limited and predictable vocabulary that may not be consistently appropriate for the intended 
audience and purpose 

Sample Answer: | think that restaurants should offer more options for people who prefer a vegetarian 
diet. There are few items on restaurant menus for vegetarians to choose from. Most restaurants say that 
they offer salads and that vegetarian options can be expensive. | think that if restaurants offered other 


63 


products, more vegetarians would eat there! 


O Point Text: 

The response: 

e includes few supporting details that may be vague, repetitive, incorrect, or interfere with the meaning 
of the text 

ehas inappropriate vocabulary for the intended audience and purpose 


A356 The student must explain the process for determining the composition of the crown. 


Prompt: Explain in words the process you could use to determine whether the crown is all gold or a mix of 
gold and silver. 


Type your answer in the space provided. 


2 Point Text: 
N/A 


1 Point Text: 
The response addresses the task in a satisfactory manner. It is complete and accurate, containing 
enough information (general or specific) to answer the question thoroughly. 


Prompt | For this ttem, the response includes one of the correct explanations: 
and 

Scoring | -| would first figure out the volume of a pure gold mass that is equal to the King's crown's mass and then 

Rubric | compare the two masses. Then | would set up two equations with two unknowns and solve. The 

for item | unknowns would be the amount of gold and the amount of silver in the crown. 

43546 | -| would set up a proportion to find out the mass of pure gold then solve the proportion. 
- Find volume of 1.8 kg of pure gold. Compare to volume of crown. 
- Find mass of 125 cm“’3 of pure gold. Compare to mass of crown. 
-Create a proportion with the crown's mass and volume compared to silver's mass and volume. Create a 
proportion with the crown's mass and volume compared to gold's mass and volume. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


A3551 The student must provide support from the stem for why card A is the best card for the game. 


Prompt | Prompt: Explain why Bingo Card A is the best card to use for this game. 
and 
Scoring | Type your answer in the space provided. 


Rubric 
for item | 2 Point Text: 


64 


43551 | N/A 


1 Point Text: 
The response addresses the task in a satisfactory manner. It is complete and accurate, containing 
enough information (general or specific) to answer the question thoroughly. 


For this item, the response includes one of the correct explanations: 


Card A contains middle numbers because they have more combinations. 

Card A contains more likely numbers because they have more combinations. 

Card A contains more likely numbers because they have more different ways. 

Card A contains more likely sums. 

Card B contains 1 and 17, which are impossible, so this card cannot win. Card C contains 
extreme and unlikely/less likely numbers because they have few combinations. 

Card B contains 1 and 17, which are impossible, so this card cannot win. Card C has higher and 
lower numbers. 

Card B contains 1 and 17, which are impossible, so this card cannot win. Card C has 16 which is 
the highest/hardest. 

Card B contains 1 and 17, which are impossible, so this card cannot win. Card C has 2 which is 
the lowest/hardest. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


A3552 The student must explain the process for how to find the new radius. 


Prompt: Use either words or an equation to tell or show the steps you took to determine the new radius. 


2 Point Text: 
N/A 


1 Point Text: 
The response addresses the task in a satisfactory manner. It is complete and accurate, containing 
enough information (general or specific) to answer the question thoroughly. 


Prompt 
and 
Scoring 
Rubric 
for item 
43552 


For this ttem, the response includes one of the correct strategies: 


i Find the volume of the standard tank. Set the combined volume formulas of cylinder and sphere 


3 
equal to twice the volume of the standard tank, which gives you the equation: 107r2 + 4— = 252. 
Then you can use trial and error to determine which radius makes this equation true. 


O Point Text: 


65 


The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 


the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


A3555 The student must explain the process for how to find the lowest cost. 


Prompt: Explain a strategy Max could use to achieve the lowest possible cost. 


Type your answer in the space provided. 


Correct responses: Increase the number of large taxis 

and 

decrease the number of empty seats 

Making sure that there are no empty seats will allow Max to save money. If there are 9 large taxis and 3 
small taxis, it will cost $687 


2 Point Text: 
The response addresses the task in a satisfactory manner. It is complete and accurate, containing 
enough information (general or specific) to answer the question thoroughly. 


For this ttem, the response includes one of both of the correct strategies: 


1st Strategy: 


Prompt 


and «2 Increase the number of large taxis. 
Scoring OR 


Rubric 
for item 
43555 


«2 Repeated subtraction of 7. 


2d Strategy: 
t Use 3 small taxis or minimize the amount of empty seats in taxi cabs. 


1 Point Text: 

The response addresses the task in a partially satisfactory manner. It is partially complete, containing 
enough information (general or specific) to answer part of the question. 

For this ttem, the response includes one of the correct strategies: 


- Increase the number of large taxis to 9. 

- Repeated subtraction of 7. 

-Use 3 small taxis or minimize the amount of empty seats in taxi cabs. 

-Use 3 small taxis and 9 large taxis. (No further explanation given) 

-Use any amount of small and large taxis that has at least 75 total seats and has more large than small 
taxis. 

- Find unit rate of both taxis and compare. 

Sample Answer: Using as few taxis as possible will be the cheapest. Use all large taxis. 


66 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


The student must state if Anne's method is appropriate for the situation and provide support for the 
decision. 


Prompt: Would it make sense to use Anne's method when determining the temperature to decide what to 
wear outside? Why or why not? 


Correct Responses: Yes, Anne's method is appropriate for determining what to wear outside based on 
the weather. The exact answer is not needed because what you wear is not based on an exact number. 


2 Point Text: 
N/A 


1 Point Text: 
The response addresses the task in a satisfactory manner. It is complete and accurate, containing 
enough information (general or specific) to answer the question thoroughly. 


For this item, the response includes one of the correct explanations: 


Prompt | - Yes, an exact answer is not necessary. 
and - Yes, temperature is within a few degrees/only a few degrees off. 
Scoring | - Yes, estimate is close to the actual answer. 
Rubric | - Yes, an approximate answer is close to the actual answer. 
for item | - Yes, Anne's method is close enough. 
43557 | - Yes, the temperature is close enough. 
- Yes, the temperature is within a close range. 
- Yes, it is ok to not have an accurate temperature. 


Note: “Yes, it is close enough for most purposes” (repeating the stem) does not receive credit. 
Sample Answer: Yes, Anne's method is close enough. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 


67 


A3564 The student must give the dimensions of a rectangle with a perimeter of 18. 


Prompt: SPACE 2 


2 Point Text: 
N/A 


1 Point Text: 
The response addresses the task in a satisfactory manner. It is complete and accurate, containing 
enough information (general or specific) to answer the question thoroughly. 


For this item, the response includes one of the correct explanations: 


Scoring 
Rubric © 


The rectangle has sides of 6 feet and 3 feet. 

A rectangle with sides 6 feet and 3 feet would have an area of 18 square feet and a perimeter of 
18 feet. 

The rectangle has a length of 6 feet and a width of 3 feet. 

The rectangle has a width of 6 feet and a length of 3 feet. 

The rectangle has dimensions of 6 feet and 3 feet. 

O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “I don't know.” 

Sample Answer: It is not possible to create an 18 square foot rectangle with an 18 foot perimeter. 


43572 The student must provide support for why the line is a line of symmetry. 


Prompt: Explain why the line you drew is a line of symmetry. 


for item 
43564 


Type your answer in the space provided. 


2 Point Text: 
N/A 
Prompt 
and 1 Point Text: 
Scoring | The response addresses the task in a satisfactory manner. It is complete and accurate, containing 


Rubric | enough information (general or specific) to answer the question thoroughly. 
for item 
435/72 | For this item, the response includes one of the correct explanations: 


e The sides are the same. 

e The edges will match. 

e The line is down the middle and there is only one way to go down the middle. 
e The line is down the middle and it is creates equal parts. 

e It is folded into matching parts. 


68 


e It will fold evenly together. 

e It will fold into the same parts. 

e When you fold the leaf in half, the parts are the same. 
e itis in halves. 

e The leaf is divided equally in half. 

e It divides the leaf into the same/equal parts. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
shown. The student may have written on a different topic or written, “| don't know.” 


The student must state if Anne's method is appropriate for the situation and provide support for the 
decision. 


Prompt: Would it make sense to use Anne's method when determining the temperature of an oven? Why 
or why not? 


Correct Responses: No, in cooking an exact answer is necessary in order to follow a recipe. Anne’s 
method is close enough to the exact temperature of an oven. 


2 Point Text: 
N/A 


1 Point Text: 
The response addresses the task in a satisfactory manner. It is complete and accurate, containing 
enough information (general or specific) to answer the question thoroughly. 
Prompt 
and For this item, the response includes one of the correct explanations: 
Scoring 
Rubric | - No, in cooking an exact answer is necessary in order to follow a recipe. 
for item | - No, Anne's method is not close enough to the exact temperature of an oven. 
43639 | -No, an approximation is not appropriate. 
-No, an estimate is not appropriate. 
- No, in cooking a right temperature is necessary. 
-No, the temperature cannot be a little over or under. 
-No, Anne's method is not accurate to find the temperature of an oven. 


O Point Text: 


The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
Shown. The student may have written on a different topic or written, “| don't know.” 


69 


The student must identify the purpose of an author's final paragraphs and provide a detail that supports 
4370/7 | that purpose. 


Prompt: Explain the purpose of the last two paragraphs in the passage. Then, explain how the author 
Supports that purpose. 


Correct Responses: The author says that it is important to keep inventing. It is important because 
zippers can have problems. 


2 Point Text: 
Response includes the following correct purpose with a correct support. 


Purpose: 
® To explain that it is important to keep inventing 
@® To explain that good ideas are only the beginning / coming up with just an idea is easy 
® To suggest that the reader may invent something faster / easier / cheaper 
@® It could be the reader's new idea that holds things together in the future. 
@® To make the reader think about future inventions 


Support: 
® Zippers have flaws / problems. 
Prompt 
a7 ® Zippers are imperfect / not perfect. 
@® Zippers break. 
Scoring 
@® People can improve zippers. 

Rubric 

faritan @® Zippers are more complicated than they seem. 
43707 @® Zippers can be tricky. 


® Inventions are important to the future. 
@® Inventions must work to be useful / it's important for people to come up with inventions that 
work. 


1 Point Text: 
Response includes the correct purpose listed above with an incorrect or missing support. 


Note: The student will NOT receive credit for a correct Support without a correct purpose. 
Sample Answer: The author says it is important to keep inventing. 


O Point Text: 

The response does not meet the criteria required to earn one point. The response indicates inadequate 
or no understanding of the task and/or the idea or concept needed to answer the item. It may only 
repeat information given in the test item. The response may provide an incorrect solution/response and 
the provided supportive information may be irrelevant to the item, or possibly, no other information is 
shown. The student may have written on a different topic or written, “| don't know.” 


A3964 The student must write an ending for an unfinished story. Specific details and appropriate word 


choice/vocabulary are required for full credit. 


70 


Prompt: 
Read the following paragraphs about a student giving a speech to the rest of her class. Then, complete 
the task that follows. 


Sitting and watching Eric give his speech, Kate knew she was next. She dreaded the eventual trek to 
the front of the room to face her peers and speak. She would have much preferred to read someone 
else's words, not her own. She felt them so closely, knew them by heart. But what if they disappeared? 
What if she couldn't remember once she stared into the eyes of those around her? 


“Kate? Kate, it's your turn.” 

“Please, can | go to the bathroom?” Kate squeaked, in one final attempt to escape. 

“Once you're finished with your soeech. Now come on,” the teacher responded. 

Knees shaking, Kate walked to the front of the room. Her stomach turned, twisted, and flipped around 


and her brain darted from one thought to the next. Her eyes focused on her teacher, then she inhaled 
and began. 


Prompt | Now, complete the story. 
and 
Scoring | Type your answer in the space provided. 


Rubric 
for item | Correct Responses: The first few words were the hardest. Her tongue felt like sandpaper and the words 
43964 | came out slowly, with a rasp and a scratch. But then Kate glanced around at the faces surrounding her. 
Every gaze was focused on her. But instead of feeling scared, Kate felt empowered. So many people 
were listening to what she had to say. Kate stood up straighter, sooke louder and felt more free. Her 
words were important; her fear was not. 


2 Point Text: 

The response: 
provides appropriate and predominately specific details or evidence 
«2 uses appropriate word choices for intended audience and purpose 


1 Point Text: 
The response: 
provides mostly general details and evidence, but may include extraneous or loosely related 
details 
«© has a limited and predictable vocabulary that may not be consistently appropriate for the 
intended audience and purpose 
Sample Answer: Kate's tongue felt like sandpaper and the words came out slowly. Kate stood up 
straighter, sooke louder and felt more free. Her words were important; her fear was not. 


O Point Text: 
The response: 


71 


Prompt 
and 
Scoring 
Rubric 
for item 
43284 


i includes few supporting details that may be vague, repetitive, incorrect, or interfere with the 
meaning of the text 
has inappropriate vocabulary for the intended audience and purpose 


The student must write an original story using information from the provided sources about games and 
sports. A full-credit response uses proper style, narrative techniques, and vocabulary. 


Prompt: You have 70 minutes to review your sources, plan, draft, and revise your story. You may refer to 
the sources. You may also refer to the answer you wrote to the question in Part 1, but you cannot change 
the answer. Now read your assignment and the information about how your story will be scored; then 
begin your work. 


Your Assignment 
You have learned about different games people like to play and win from the photo, articles, and stories. 
You will now write a story from the point of view of someone involved in or watching a game, event, or 


sport. 


Use the information from the sources about games and sports to write a narrative story. You should 
present factual information about the activity and also develop characters and a plot. 


Story Scoring 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


Now begin work on your story. Manage your time carefully so that you can: 


2 ~=plan your story 
2 ~=write your story 
2 revise and edit your story 


REMEMBER: A well-written narrative story: 


i =©—has a setting, narrative, and/or characters 
has a plot with a beginning, middle, and end 
uses clear language that suits your purpose 
follows rules of writing (Spelling, punctuation, and grammar) 


4 Point Text: 
Establishment of Narrative Focus and Organization 


The narrative, real or imagined, is clearly focused and maintained throughout: 


72 


® effectively establishes a setting, narrator and/or characters, and point of view 
The narrative, real or imagined, has an effective plot helping create a sense of unity and completeness: 
® consistent use of a variety of transitional strategies to clarify the relationships between and 
among ideas 
® logical sequence of events from beginning to end 
® effective opening and closure for audience and purpose 


Development/Elaboration 


The narrative, real or imagined, provides thorough and effective elaboration using details, dialogue, and 
description: 


@® effective use of a variety of narrative techniques that advance the story or illustrate the 
experience 


The narrative, real or imagined, clearly and effectively expresses experiences or events: 
@® effective use of sensory, concrete, and figurative language clearly advance the purpose 


3 Point Text: 
Establishment of Narrative Focus and Organization 


The narrative, real or imagined, is adequately focused and generally maintained throughout: 
® adequately establishes a setting, narrator and/or characters, and/or point of view 


The narrative, real or imagined, has an evident plot helping to create a sense of unity and completeness, 
though there may be minor flaws and some ideas may be loosely connected: 


@® adequate use of a variety of transitional strategies to clarify the relationships between and 
among ideas 
@® adequate sequence of events from beginning to end 
@® adequate opening and closure for audience and purpose 
Development/Elaboration 


The narrative, real or imagined, provides adequate elaboration using details, dialogue and description: 


@® adequate use of a variety of narrative techniques that generally advance the story or illustrate 
the experience 


The narrative, real or imagined, adequately expresses experiences or events: 


@® adequate use of sensory, concrete, and figurative language generally advance the purpose 


73 


2 Point Text: 
Establishment of Narrative Focus and Organization 


The narrative, real or imagined, is somewhat maintained and may have a minor drift in focus: 
® inconsistently establishes a setting, narrator and/or characters, and/or point of view 

The narrative, real or imagined, has an inconsistent plot, and flaws are evident: 

inconsistent use of transitional strategies and/or little variety 

uneven sequence of events from beginning to end 


opening and closure, if present, are weak 
weak connection among ideas 


66686 


Development/Elaboration 


The narrative, real or imagined, provides uneven, cursory elaboration using partial and uneven details, 
dialogue, and description: 


® narrative techniques, if present, are uneven and inconsistent 
The narrative, real or imagined, unevenly expresses experiences or events: 


® partial or weak use of Sensory, concrete, and figurative language that may not advance the 
purpose 


Conventions 
The narrative, real or imagined, demonstrates an adequate command of conventions: 
@® errors in usage and sentence formation but no systematic pattern of errors is displayed and 
meaning is not obscured 


@® adequate use of punctuation, capitalization, and spelling 


1 Point Text: 
Establishment of Narrative Focus and Organization 


The narrative, real or imagined, may be maintained but may provide little or no focus: 
® may be very brief 
® may have a major drift in focus 
@® focus may be confusing or ambiguous 


The narrative, real or imagined, has little or no discernible plot: 


® fewornotransitional strategies are evident 


74 


43334 


Prompt 
and 
Scoring 
Rubric 
for item 
43334 


® frequent extraneous ideas may intrude 
Development/Elaboration 


The narrative, real or imagined, provides minimal elaboration using little or no details, dialogue, and/or 
description: 


® use of narrative techniques is minimal, absent, incorrect, or irrelevant 


The narrative, real or imagined, expression of ideas is vague, lacks clarity, or is confusing: 


@® uses limited language 
® may have little sense of purpose 


Conventions 
The narrative, real or imagined, demonstrates a partial command of conventions: 


@® frequent errors in usage may obscure meaning 
® inconsistent use of punctuation, capitalization, and spelling 


O Point Text: 
Insufficient, illegible, in a language other than English, incoherent, off-topic, or off-purpose writing 


The student must use information from the provided documents to write a letter about the importance of 
visiting the dentist. A full-credit response use proper style, organization, reasoning, and vocabulary. 
Prompt: You have 70 minutes to review your sources, plan, draft, and revise your letter. You may refer to 
the sources. Read your assignment and the information about how your letter will be scored; then begin 
your work. 


Your Assignment 

Your friend tells you he has a dentist's appointment. This is his first dentist appointment and he doesn't 
know what to expect. You decide to write a letter to your friend informing him of what he can expect at 
the dentist's office. 


In Your Letter 


Write a well-organized, multi-paragraph letter explaining why it is important to visit the dentist regularly 
and what to expect at the dentist's office. Be sure to include details from the articles to Support your 
explanation. 


Now begin work on your letter. Manage your time carefully so that you can: 
2 ~=plan your letter 
2 ~=write your letter 
2 ~=revise and edit for a final draft 


75 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


REMEMBER: A well-written informational letter: 


2 =has aclear main idea 

 ~=Is well-organized and stays on the topic 

2 ~=provides reasoning and evidence to support your topic 

2 uses clear language that suits your purpose 

9 follows rules of writing (Spelling, punctuation, and grammar) 


4 Point Text: 
Statement of Purpose/Focus and Organization 
The response is fully sustained, and consistently and purposefully focused: 
«controlling idea or main idea of a topic is clearly stated, focused, and strongly maintained 
controlling idea or main idea of a topic is introduced and communicated clearly within the 
purpose, audience, and task 


The response has a clear and effective organizational structure creating a sense of unity and 
completeness: 
2 consistent use of a variety of transitional strategies to clarify the relationships between and 
among ideas 
2 logical progression of ideas from beginning to end 
« effective introduction and conclusion for audience and purpose 


Evidence/Elaboration 
The response provides thorough and convincing support/evidence for the controlling idea or main idea 
that includes the effective use of sources, facts, and details: 

i use of evidence from sources Is integrated, comprehensive, and relevant 

« effective use of a variety of elaborative techniques 


The response clearly and effectively expresses ideas, using precise language: 
use of academic and domain-specific vocabulary is clearly appropriate for the audience and 
purpose 


3 Point Text: 


Statement of Purpose/Focus and Organization 
The response is adequately sustained and generally focused: 
controlling idea or main idea of a topic is clear and mostly maintained, though some loosely 
related material may be present 
2 some context for the controlling idea or main idea of the topic is adequate within the purpose, 
audience, and task 


The response has an evident organizational structure and a sense of completeness, though there may 


be minor flaws and some ideas may be loosely connected: 
2 adequate use of transitional strategies with some variety to clarify the relationships between and 


76 


among ideas 
2 adequate progression of ideas from beginning to end 
2 adequate introduction and conclusion 


Evidence/Elaboration 
The response provides adequate support/evidence for controlling idea or main idea that includes the 
use of sources, facts, and details: 

2 some evidence from sources is included, though citations may be general or imprecise 

« adequate use of some elaborative techniques 


The response adequately expresses ideas, employing a mix of precise with more general language: 
i use of domain-specific vocabulary is generally appropriate for the audience and purpose 


2 Point Text: 

Statement of Purpose/Focus and Organization 

The response is somewhat sustained and may have a minor drift in focus: 
2 may be clearly focused on the controlling or main idea, but is insufficiently sustained, or 
 ~=controlling idea or main idea may be unclear and/or somewhat unfocused 


The response has an inconsistent organizational structure, and flaws are evident: 
i inconsistent use of transitional strategies and/or little variety 
2 uneven progression of ideas from beginning to end 
conclusion and introduction, if present, are weak 


Evidence/Elaboration 
The response provides uneven, cursory support/evidence for the controlling idea or main idea that 
includes partial or uneven use of sources, facts, and details: 

2 ~©evidence from sources is weakly integrated, and citations, if present, are uneven 

«2 weak or uneven use of elaborative techniques 


The response expresses ideas unevenly, using simplistic language: 
« use of domain-specific vocabulary that may at times be inappropriate for the audience and 
purpose 


Conventions 
The response demonstrates an adequate command of conventions: 
2 «errors in usage and sentence formation may be present, but no systematic pattern of errors is 
displayed and meaning is not obscured 
2 adequate use of punctuation, capitalization, and spelling 


1 Point Text: 


Statement of Purpose/Focus and Organization 

The response may be related to the topic but may provide little or no focus: 
2 =may be very brief 
2 may have a major drift 
2 focus may be confusing or ambiguous 


T7 


The response has little or no discernible organizational structure: 
‘2 few or no transitional strategies are evident 
i «frequent extraneous ideas may intrude 


Evidence/Elaboration 
The response provides minimal support/evidence for the controlling idea or main idea that includes little 
or no use of sources, facts, and details: 

2 ~=use of evidence from the source material is minimal, absent, incorrect, or irrelevant 


The response's expression of ideas is vague, lacks clarity, or is confusing: 
i uses limited language or domain-specific vocabulary 
may have little sense of audience and purpose 


Conventions 
The response demonstrates a partial command of conventions: 
2 «errors in Usage may obscure meaning 
2 ~=inconsistent use of punctuation, capitalization, and spelling 


O Point Text: 


A response gets no credit if it provides no evidence of the ability to structure and write an essay. 

The student must use evidence from the provided documents to write an argumentative essay for or 
against tourism in national parks. A full-credit response uses proper organization, focus, evidence, style, 
and vocabulary. 

Prompt: You have 70 minutes to review your sources, plan, draft, and revise your argumentative article. 
You may refer to the sources. Read your assignment and the information about how your article will be 
scored; then begin your work. 


Your Assignment 


Your Class is planning a field trip to a national park. After researching about the role of the National Park 
Service, you have been asked by your teacher to write an argumentative article about national parks for 
Prompt | the school newspaper. 
and 
Scoring | Write an article that argues whether the National Park Service should or should not promote tourism for 
Rubric | national parks to increase attendance. Be sure that your argument acknowledges both sides of the issue 
for item | so that people know that you have considered the issue carefully. Support your claim with evidence from 
43438 | the sources. You do not need to use all the sources, only the ones that most effectively support your 
argument. 


Article Scoring 
Your argumentative article will be scored on the following criteria: 


1. Statement of purpose / focus and organization—How well did you clearly state your claim on the topic 
and maintain your focus? How well did your ideas logically flow from the introduction to conclusion using 


78 


effective transitions? How well did you stay on topic throughout the article? 

2. Elaboration of evidence—How well did you provide evidence from the sources to support your 
opinions? How well did you elaborate with specific information from the sources you reviewed? How well 
did you effectively express ideas using precise language that was appropriate for your audience and 
purpose? 

3. Conventions—How well did you follow the rules of usage, punctuation, capitalization, and spelling? 


Now begin work on your article. Manage your time carefully so that you can: 


2 ~=plan your article 
2 ~=write your article 
2 ~=revise and edit for a final draft 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


REMEMBER: A well-written argumentative article: 
0 has aclear main idea 
is well-organized and stays on topic 
provides evidence from the sources to support your topic 
uses clear language that suits your purpose 
follows rules of writing (Spelling, punctuation, and grammar) 


Co CoO Co Co 


4 Point Text: 

Statement of Purpose/Focus and Organization 

The response is fully sustained and consistently and purposefully focused: 
Claim is clearly stated, focused, and strongly maintained 
alternate or opposing claims are clearly addressed 


2 claim is introduced and communicated clearly within the purpose, audience, and task 


The response has a clear and effective organizational structure creating a sense of unity and 
completeness: 


2 consistent use of a variety of transitional strategies to clarify the relationships between and 
among ideas 

2 logical progression of ideas from beginning to end 

effective introduction and conclusion for audience and purpose 

i strong connections among ideas, with some syntactic variety 


Evidence/Elaboration 


79 


The response provides thorough and convincing support/evidence for the writer's claim that includes the 
effective use of sources, facts, and details. The response achieves substantial depth that is specific and 
relevant: 


i use of evidence from sources is integrated, comprehensive, relevant, and concrete 
effective use of a variety of elaborative techniques 


The response clearly and effectively expresses ideas, using precise language: 


i use of academic and domain-specific vocabulary is clearly appropriate for the audience and 
purpose 


3 Point Text: 

Statement of Purpose/Focus and Organization 

The response is fully sustained and consistently and purposefully focused: 
2 Claim is clearly stated, focused, and strongly maintained 
alternate or opposing claims are clearly addressed 


t claim is introduced and communicated clearly within the purpose, audience, and task 


The response has a clear and effective organizational structure creating a sense of unity and 
completeness: 


2 consistent use of a variety of transitional strategies to clarify the relationships between and 
among ideas 

i logical progression of ideas from beginning to end 

effective introduction and conclusion for audience and purpose 

i strong connections among ideas, with some syntactic variety 
Evidence/Elaboration 
The response provides thorough and convincing support/evidence for the writer's claim that includes the 
effective use of sources, facts, and details. The response achieves substantial depth that is specific and 
relevant: 


i use of evidence from sources is integrated, comprehensive, relevant, and concrete 
© effective use of a variety of elaborative techniques 


The response clearly and effectively expresses ideas, using precise language: 


© use of academic and domain-specific vocabulary is clearly appropriate for the audience and 
purpose 


2 Point Text: 


80 


Statement of Purpose/Focus and Organization 
The response is somewhat sustained and may have a minor drift in focus: 


may be clearly focused on the claim but is insufficiently sustained, or 
claim on the issue may be somewhat unclear and/or unfocused 


The response has an inconsistent organizational structure, and flaws are evident: 
«2 inconsistent use of transitional strategies and/or little variety 
i Uneven progression of ideas from beginning to end 
2 conclusion and introduction, if present, are weak 
2 weak connection among ideas 


Evidence/Elaboration 


The response provides uneven, cursory support/evidence for the writer's claim that includes partial or 
uneven use of sources, facts, and details. The response achieves little depth: 


evidence from sources is weakly integrated, and citations, if present, are uneven 
«2 weak or uneven use of elaborative techniques 


The response expresses ideas unevenly, using simplistic language: 
© use of domain-specific vocabulary may at times be inappropriate for the audience and purpose 
Conventions 
The response demonstrates an adequate command of conventions: 
i errors in usage and sentence formation may be present, but no systematic pattern of errors is 
displayed and meaning is not obscured 
i adequate use of punctuation, capitalization, and spelling 
1 Point Text: 
Statement of Purpose/Focus and Organization 
The response may be related to the purpose but may provide little or no focus: 
i may be very brief 
i may have a major drift 


i Claim may be confusing or ambiguous 


The response has little or no discernible organizational structure: 


81 


Prompt 
and 
Scoring 
Rubric 
for item 
43469 


« few or no transitional strategies are evident 
i frequent extraneous ideas may intrude 


Evidence/Elaboration 


The response provides minimal support/evidence for the writer's claim that includes little or no use of 
sources, facts, and details: 


2 use of evidence from sources is minimal, absent, incorrect, or irrelevant 
The response's expression of ideas is vague, lacks clarity, or is confusing: 


i uses limited language or domain-specific vocabulary 
may have little sense of audience and purpose 


Conventions 
The response demonstrates a partial command of conventions: 


2 errors in Usage may obscure meaning 
i inconsistent use of punctuation, capitalization, and spelling 


O Point Text: 


Insufficient, illegible, in a language other than English, incoherent, off-topic, or off-purpose writing 


The student must write an orginal story about traveling to the Moon, using information from the provided 
documents. A full-credit responses uses proper narrative techniques, details, style, and vocabulary. 
Prompt: You will now have 70 minutes to review your documents, plan, draft, and revise you narrative 
story. You may refer to the documents. You may also refer to the answer you wrote to the question in Part 
1, but you cannot change that answer. Now read your assignment and the information about how your 
story will be scored; then begin your work. 


Your Assignment 


Your class is studying a unit about science fiction literature. You have been learning about how science 
fiction stories are based on new understandings in science. 


Your assignment is to write a short science fiction story about traveling to the Moon or living on the Moon 
using current scientific knowledge to shape the story. You should include details from the source 
material for your narrative story. You do not need to use all the documents, only the ones that best 
support the details in your narrative story. 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


Now begin work on your story. Manage your time carefully so that you can: 


82 


2 ~=plan your story 
2 ~=write your story 
i revise and edit your story 


Narrative Scoring 
Your narrative story will be scored on the following criteria: 


develops a setting and characters 

uses a point of view 

has a plot with a beginning, middle, and end 

uses details and dialogue 

uses clear language that suits your purpose 

follows rules of writing (Spelling, punctuation, and grammar) 


BEE EEE 


4 Point Text: 


Establishment of Narrative Focus and Organization 
The narrative, real or imagined, is clearly focused and maintained throughout: 
® effectively establishes a setting, narrator and/or characters, and point of view 


The narrative, real or imagined, has an effective plot helping create unity and completeness: 
® effective, consistent use of a variety of transitional strategies 
® logical sequence of events from beginning to end 
® effective opening and closure for audience and purpose 


Development/Elaboration 
The narrative, real or imagined, provides thorough and effective elaboration using details, dialogue, and 


description: 
@® effective use of a variety of narrative techniques that advance the story or illustrate the 


experience 


The narrative, real or imagined, clearly and effectively expresses experiences or events: 
@® effective use of sensory, concrete, and figurative language clearly advance the purpose 


3 Point Text: 


Establishment of Narrative Focus and Organization 
The narrative, real or imagined, is adequately focused and generally maintained throughout: 
® adequately establishes a setting, narrator and/or characters, and point of view 


The narrative, real or imagined, has an evident plot helping create a sense of unity and completeness, 
though there may be minor flaws and some ideas may be loosely connected: 

@® adequate use of a variety of transitional strategies 

® adequate sequence of events from beginning to end 

® adequate opening and closure for audience and purpose 


83 


Development/Elaboration 
The narrative, real or imagined, provides adequate elaboration using details, dialogue, and description: 
@® adequate use of a variety of narrative techniques that generally advance the story or illustrate 

the experience 


The narrative, real or imagined, adequately expresses experiences or events: 
@® adequate use of sensory, concrete, and figurative language generally advance the purpose 


2 Point Text: 


Establishment of Narrative Focus and Organization 
The narrative, real or imagined, is somewhat maintained and may have a minor drift in focus: 
® inconsistently establishes a setting, narrator and/or characters, and point of view 


The narrative, real or imagined, has an inconsistent plot, and flaws are evident: 
inconsistent use of basic transitional strategies with little variety 
uneven sequence of events from beginning to end 

opening and closure, if present, are weak 

weak connection among ideas 


66 ee 


Development/Elaboration 
The narrative, real or imagined, provides uneven, cursory elaboration using partial and uneven details, 
dialogue, and description: 

® narrative techniques, if present, are uneven and inconsistent 


The narrative, real or imagined, unevenly expresses experiences or events: 
® partial or weak use of Sensory, concrete, and figurative language that may not advance the 
purpose 


The narrative, real or imagined, demonstrates an adequate command of conventions: 
® errors in usage and sentence formation, but no systematic pattern of errors is displayed and 
meaning is not obscured 
@® adequate use of punctuation, capitalization, and spelling 


1 Point Text: 


Establishment of Narrative Focus and Organization 

The narrative, real or imagined, may be maintained but may provide little or no focus: 
® may be very brief 
® may have a major drift 
@® focus may be confusing or ambiguous 


The narrative, real or imagined, has little or no discernible plot: 


® fewornotransitional strategies are evident 
@® frequent extraneous ideas may intrude 


84 


Prompt 
and 
Scoring 
Rubric 
for item 
43479 


Development/Elaboration 
The narrative, real or imagined, provides minimal elaboration using little or no details, dialogue, and 
description: 

® use of narrative techniques is minimal, absent, in error, or irrelevant 


The narrative, real or imagined, expression of ideas is vague, lacks clarity, or is confusing: 
@® uses limited language 
® may have little sense of purpose 


Conventions 

The narrative, real or imagined, demonstrates a partial command of conventions: 
@® frequent errors in usage may obscure meaning 
® inconsistent use of punctuation, capitalization, and spelling 


O Point Text: 


Insufficient, illegible, in a language other than English, incoherent, off-topic, or off-purpose writing 


The student must use evidence from the provided documents to write an argumentative essay for 
attending college either in a small town or a big city. A full-credit response uses proper organization, 
focus, evidence, style, and vocabulary. 
Prompt: The editor of your school newspaper has asked you to write an argumentative article about 


whether students should go to college in a small town or a big city. After you analyze all of the sources, 
determine which view you support. Compose a full-length argumentative essay in support of your view. In 
your writing, use logical reasoning and evidence from the sources to support your claim and to refute 
counterarguments. 


Now begin work on your argumentative article. Manage your time carefully so that you can: 


2 ~=plan your article 
2 ~=write your article 
2 ~=revise and edit for a final draft 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


REMEMBER: A well-written article: 


has a clear main idea 

is well-organized and stays on the topic 

provides evidence from the sources to support your topic 
uses clear language that suits your purpose 

follows rules of writing (Spelling, punctuation, and grammar) 


4 Point Text: 
The response is fully sustained and consistently and purposefully focused: 


85 


® claim is clearly stated, focused and strongly maintained 
@® alternate or opposing claims are clearly addressed 
© claim is introduced and communicated clearly within the context 


The response has a clear and effective organizational structure creating unity and completeness: 
effective, consistent use of a variety of transitional strategies 

logical progression of ideas from beginning to end 

effective introduction and conclusion for audience and purpose 

strong connections among ideas, with some syntactic variety 


66 e ®e 


The response provides thorough and convincing support/evidence for the writer's claim that includes the 
effective use of sources, facts, and details. The response achieves substantial depth that is specific and 
relevant: 

® use of evidence from sources is smoothly integrated, comprehensive, relevant, and concrete 

@® effective use of a variety of elaborative techniques 


The response clearly and effectively expresses ideas, using precise language: 
® use of academic and domain-specific vocabulary is clearly appropriate for the audience and 
purpose 


The response demonstrates a strong command of conventions: 
@® few, if any, errors are present in usage and sentence formation 
® effective and consistent use of punctuation, capitalization, and spelling 


3 Point Text: 
The response is adequately sustained and generally focused: 
® claim is clear and for the most part maintained, though some loosely related material may be 
present 
® context provided for the claim is adequate 


The response has an evident organizational structure and a sense of completeness, though there may 
be minor flaws and some ideas may be loosely connected: 

® adequate use of transitional strategies with some variety 

® adequate progression of ideas from beginning to end 

© adequate introduction and conclusion 

® adequate, if slightly inconsistent, connection among ideas 


The response provides adequate support/evidence for writer's claim that includes the use of sources, 
facts, and details. The response achieves some depth and specificity but is predominantly general: 

@® some evidence from sources is integrated, though citations may be general or imprecise 

@® adequate use of some elaborative techniques 


The response adequately expresses ideas, employing a mix of precise with more general language: 
@® use of domain-specific vocabulary is generally appropriate for the audience and purpose 


The response demonstrates an adequate command of conventions: 
® some errors in usage and sentence formation may be present, but no systematic pattern of 


86 


errors is displayed 
@® adequate use of punctuation, capitalization, and spelling 


2 Point Text: 

The response is somewhat sustained and may have a minor drift in focus: 
@® may be clearly focused on the claim but is insufficiently sustained 
® claim on the issue may be somewhat unclear and unfocused 


The response has an inconsistent organizational structure, and flaws are evident: 
inconsistent use of basic transitional strategies with little variety 

uneven progression of ideas from beginning to end 

conclusion and introduction, if present, are weak 

weak connection among ideas 


6660680 


The response provides uneven, cursory support/evidence for the writer's claim that includes partial or 
uneven use of sources, facts, and details, and achieves little depth: 

@® evidence from sources is weakly integrated, and citations, if present, are uneven 

@® weak or uneven use of elaborative techniques 


The response expresses ideas unevenly, using simplistic language: 
® use of domain-specific vocabulary may at times be inappropriate for the audience and purpose 


The response demonstrates a partial command of conventions: 
@® frequent errors in usage may obscure meaning 
® inconsistent use of punctuation, capitalization, and spelling 


1 Point Text: 

The response may be related to the purpose but may offer little relevant detail: 
@® may be very brief 
® may have a major drift 
® claim may be confusing or ambiguous 


The response has little or no discernible organizational structure: 
@® fewornotransitional strategies are evident 
® frequent extraneous ideas may intrude 


The response provides minimal support/evidence for the writer's claim that includes little or no use of 
sources, facts, and details: 
® use of evidence from sources is minimal, absent, in error, or irrelevant 


The response expression of ideas is vague, lacks clarity, or is confusing: 
® uses limited language or domain-specific vocabulary 
® may have little sense of audience and purpose 


The response demonstrates a lack of command of conventions: 
® errors are frequent and severe and meaning is often obscure 


87 


O Point Text: 
A response gets no credit if it provides no evidence of the ability to establish and support a formal 
argumentative claim. 


The student must write an opinion essay about the region of the US they would like to live in. A full-credit 
43504 response includes details from the provided documents and uses proper organization, style, and 
vocabulary. 
Prompt: You have 70 minutes to review your sources, plan, draft, and revise your essay. Read your 
assignment and the information about how your essay will be scored; then begin your work. 


Your Assignment 


You have read three documents about different regions in the United States. You have been asked to 
give an opinion about these regions. 


Write an opinion essay about which region you would want to live in. State your opinion in the essay, 
using reasons and supporting details from the sources to explain what you like about this region and 
why. 


In Your Essay 


Write a well-organized essay that develops your opinion about which region you would want to live in. Be 
sure to include reasons and details from the sources to Support your opinion. 


Prompt | Now begin work on your opinion essay. Manage your time carefully so that you can: 
and 


Scoring «2 plan your essay 

Rubric 2 ~=write your essay 
for item 2 ~=revise and edit for a final draft 
43504 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


REMEMBER: A well-written opinion essay: 


has a clear opinion 

is well-organized and stays on the topic 

provides evidence and details from the sources to Support your opinion 
uses clear language that suits your purpose 

follows rules of writing (Spelling, punctuation, and grammar) 


BEE EE 


4 Point Text: 
Statement of Purpose/Focus and Organization 
The response is fully sustained and consistently and purposefully focused: 
® opinion is clearly stated, focused, and strongly maintained 
© opinion is communicated clearly within the purpose, audience, and task 


88 


The response has a clear and effective organizational structure creating a sense of unity and 
completeness: 
® consistent use of a variety of transitional strategies to clarify the relationships between and 
among ideas 
® logical progression of ideas from beginning to end 
® effective introduction and conclusion for audience and purpose 


Evidence/Elaboration 


The response provides thorough and convincing support/evidence for the writer's opinion that includes 
the effective use of sources, facts, and details: 

@® use of evidence from sources is integrated, comprehensive, and relevant 

® effective use of a variety of elaborative techniques 


The response clearly and effectively expresses ideas, using precise language: 
® use of academic and domain-specific vocabulary is clearly appropriate for the audience and 
purpose 


3 Point Text: 


Statement of Purpose/Focus and Organization 


The response is adequately sustained and generally focused: 
® opinion is clear and mostly maintained, though some loosely related material may be present 
® context provided for the opinion is adequate within the purpose, audience, and task 


The response has a recognizable organizational structure, and a sense of completeness, though there 
may be minor flaws and some ideas may be loosely connected: 
@® adequate use of transitional strategies with some variety to clarify the relationships between and 
among ideas 
® adequate progression of ideas from beginning to end 
@® adequate introduction and conclusion 


Evidence/Elaboration 


The response provides adequate support/evidence for the writer's opinion that includes the use of 
sources, facts, and details: 

@® some evidence from sources is included, though citations may be general or imprecise 

@® adequate use of some elaborative techniques 


The response adequately expresses ideas, employing a mix of precise with more general language: 
® use of domain-specific vocabulary is generally appropriate for the audience and purpose 


2 Point Text: 


Statement of Purpose/Focus and Organization 


The response is somewhat sustained and may have a minor drift in focus: 
® may be clearly focused on the opinion but is insufficiently sustained, or 
® opinion on the issue may be somewhat unclear and/or unfocused 


89 


The response has an inconsistent organizational structure, and flaws are evident: 
® inconsistent use of transitional strategies and/or little variety 
@® uneven progression of ideas from beginning to end 
® conclusion and introduction, if present, are weak 


Evidence/Elaboration 


The response provides uneven, cursory support/evidence for the writer's opinion that includes partial or 
uneven use of sources, facts, and details: 

® evidence from sources is weakly integrated, and citations, if present, are uneven 

@® weak or uneven use of elaborative techniques 


The response expresses ideas unevenly, using simplistic language: 
® use of domain-specific vocabulary may at times be inappropriate for the audience and purpose 


Conventions 
The response demonstrates an adequate command of conventions: 
® errors in usage and sentence formation are present, but no systematic pattern of errors is 
displayed and meaning is not obscured 
@® adequate use of punctuation, capitalization, and spelling 


1 Point Text: 


Statement of Purpose/Focus and Organization 


The response may be related to the opinion but may provide little or no focus: 
@® may be very brief 
@® may have a major drift 
® opinion may be confusing or ambiguous 


The response has little or no discernible organizational structure: 
® fewornotransitional strategies are evident 
® frequent extraneous ideas may intrude 


Evidence/Elaboration 


The response provides minimal support/evidence for the writer's opinion that includes little or no use of 
sources, facts, and details: 
@ use of evidence from sources is minimal, absent, incorrect, or irrelevant 


The response's expression of ideas is vague, lacks Clarity, or is confusing: 
® uses limited language or domain-specific vocabulary 
® may have little sense of audience and purpose 


Conventions 
The response demonstrates a partial command of conventions: 
@® errors in usage may obscure meaning 
® inconsistent use of punctuation, capitalization, and spelling 


90 


O Point Text: 


Insufficient, illegible, in a language other than English, incoherent, off-topic, or off-purpose writing. 


Using information from the provided documents, the student must write a report about different 
43632 Ces 
methods of studying. A full-credit response uses proper organization, evidence, style, and vocabulary. 
Prompt: Imagine you are the student who wrote the journal entry. You research better study skills and 
decide to share them with your friend Megan. Your guidance counselor hears about what you have 
learned and asks if you could write an informational report for all the students in your school. You accept 
her request. 


Write an informational report about different methods of improving memory and learning. Explain how 
effective these methods are and why. Then, recommend a general plan for all students. Support your 
report with evidence from the sources you have examined. 


Now begin work on your informational report. Manage your time carefully so that you can: 
2 ~=plan your report 
2 ~=write your report 


2 ~=revise and edit for a final draft 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


Prompt 
and 
Scoring 


REMEMBER: A well-written report: 


2 has aclear main idea 
 ~=Is well-organized and stays on the topic 
for item 
provides evidence from the sources to support your topic 
43632 : 
2 ~=uses clear language that suits your purpose 
9 follows rules of writing (Spelling, punctuation, and grammar) 


Rubric 


4 Point Text: 


The response is fully sustained, and consistently and purposefully focused: 
& «controlling idea or main idea of a topic is clearly stated, focused, and strongly maintained 
& controlling idea or main idea of a topic is introduced and communicated clearly within the 
purpose, audience, and task 


The response has a clear and effective organizational structure creating a sense of unity and 
completeness: 

2 consistent use of a variety of transitional strategies to clarify the relationships between and 
among ideas 
logical progression of ideas from beginning to end 
effective introduction and conclusion for audience and purpose 
strong connections among ideas, with some syntactic variety 


BEE 


9] 


The response provides thorough and convincing support/evidence for the controlling idea or main idea 
that includes the effective use of sources, facts, and details. The response achieves substantial depth 
that is specific and relevant: 

i use of evidence from sources is integrated, comprehensive, relevant, and concrete 

© effective use of a variety of elaborative techniques 


The response clearly and effectively expresses ideas, using precise language: 
«© use of academic and domain-specific vocabulary is clearly appropriate for the audience and 
purpose 


The response demonstrates an adequate command of conventions: 
2 «errors in usage and sentence formation are present, but no systematic pattern of errors is 
displayed and meaning is not obscured 
2 adequate use of punctuation, capitalization, and spelling 


3 Point Text: 


The response is adequately sustained and generally focused: 
& «controlling idea or main idea of a topic is clear mostly maintained, though some loosely related 
material may be present 
2 some context for the controlling idea or main idea of the topic is adequate within the purpose, 
audience, and task 


The response has an evident organizational structure and a sense of completeness, though there may 
be minor flaws and some ideas may be loosely connected: 

2 adequate use of transitional strategies with some variety between and among ideas 

2 adequate progression of ideas from beginning to end 

2 adequate introduction and conclusion 

adequate, if slightly inconsistent, connection among ideas 


The response provides adequate support/evidence for the controlling idea or main idea that includes the 
use of sources, facts, and details: 

2 some evidence from sources is included, though citations may be general or imprecise 

« adequate use of some elaborative techniques 


The response adequately expresses ideas, employing a mix of precise with more general language : 
i use of domain-specific vocabulary is generally appropriate for the audience and purpose 


The response demonstrates an adequate command of conventions: 
2 «errors in usage and sentence formation are present, but no systematic pattern of errors is 
displayed and meaning is not obscured 
2 adequate use of punctuation, capitalization, and spelling 
2 Point Text: 


The response is somewhat sustained and may have a minor drift in focus: 
2 may be clearly focused on the controlling or main idea but is insufficiently sustained, or 


92 


43635 


 ~=6controlling idea or main idea may be unclear and/or somewhat unfocused 


The response has an inconsistent organizational structure, and flaws are evident: 
i inconsistent use of transitional strategies and/or little variety 
i uneven progression of ideas from beginning to end 
conclusion and introduction, if present, are weak 
weak connection among ideas 


The response provides uneven, cursory support/evidence for the controlling idea or main idea that 
includes partial or uneven use of sources, facts, and details. The response achieves little depth: 
i ~©evidence from sources is weakly integrated, and citations, if present, are uneven 


The response expresses ideas unevenly, using simplistic language: 
use of domain-specific vocabulary may at times be inappropriate for the audience and purpose 


The response demonstrates a lack of command of conventions: 
2 «errors are frequent and severe and meaning is often obscure 


1 Point Text: 


The response may be related to the topic but may provide little or no focus: 
2 «may be very brief 
i may have a major drift 
2 focus may be confusing or ambiguous 


The response has little or no discernible organizational structure: 
2 few or no transitional strategies are evident 
i ~©frequent extraneous ideas may intrude 


The response provides minimal support/evidence for the controlling idea or main idea that includes little 
or no use of sources, facts, and details: 

2 use of evidence from sources is minimal, absent, incorrect, or irrelevant 
The response expression of ideas is vague, lacks clarity, or is confusing: 

i uses limited language or domain-specific vocabulary 

may have little sense of audience and purpose 


The response demonstrates a lack of command of conventions: 
2 «errors are frequent and severe and meaning is often obscure 


O Point Text: 


Insufficient, illegible, in a language other than English, incoherent, off-topic, or off-purpose writing 


The student must write an editorial about the pros and cons of cell phones in daily life. A full-credit 


response includes information from the provided documents and uses proper organization, style, and 


93 


[| vocabulary. 


Prompt 
and 
Scoring 


Rubric 
for item 
43635 


Prompt: You are interested in pursuing a career in journalism and decide to apply for a position with your 
school newspaper. The editor-in-chief asks you to submit a viewpoint editorial for consideration. You 
decide to write about cell phones. 


Write an argumentative essay that evaluates the pros and cons of cell phone use and states whether cell 
phones make daily life better or worse. Make sure to address potential counterarguments in your essay 
and support your claim with the sources you have examined. 


Now begin work on your argumentative essay. Manage your time carefully so that you can: 


2 plan your essay 
2 write your essay 
2 ~=revise and edit for a final draft 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


REMEMBER: A well-written argumentative essay: 


2 =has aclear main idea 

 ~=Is well-organized and stays on the topic 

provides evidence from the sources to support your topic 

2 uses clear language that suits your purpose 

follows rules of writing (Spelling, punctuation, and grammar) 


4 Point Text: 


The response provides thorough and convincing support/evidence for the writer's claim that includes the 
effective use of sources, facts, and details. The response achieves substantial depth that is specific and 
relevant: 

i use of evidence from sources is integrated, comprehensive, relevant, and concrete 

« effective use of a variety of elaborative techniques 


The response clearly and effectively expresses ideas, using precise language: 
© use of academic and domain-specific vocabulary is clearly appropriate for the audience and 
purpose 


The response demonstrates an adequate command of conventions: 
2 errors in usage and sentence formation may be present, but no systematic pattern of errors is 
displayed and meaning is not obscured 
2 adequate use of punctuation, capitalization, and spelling 
3 Point Text: 


The response provides adequate support/evidence for the writer's claim that includes the use of 
sources, facts, and details. The response achieves some depth and specificity but is predominantly 


94 


general: 
2 some evidence from sources is included, though citations may be general or imprecise 
« adequate use of some elaborative techniques 


The response adequately expresses ideas, employing a mix of precise with more general language: 
i use of domain-specific vocabulary is generally appropriate for the audience and purpose 


The response demonstrates an adequate command of conventions: 
2 errors in usage and sentence formation may be present, but no systematic pattern of errors is 
displayed and meaning is not obscured 
2 adequate use of punctuation, capitalization, and spelling 


2 Point Text: 


The response provides uneven, cursory support/evidence for the writer's claim that includes partial or 
uneven use of sources, facts, and details. The response achieves little depth: 

2 evidence from sources is weakly integrated, and citations, if present, are uneven 

2 weak or uneven use of elaborative techniques 


The response expresses ideas unevenly, using simplistic language: 
use of domain-specific vocabulary may at times be inappropriate for the audience and purpose 


The response demonstrates a partial command of conventions: 
2 errors in Usage may obscure meaning 
i «inconsistent use of punctuation, capitalization, and spelling 


1 Point Text: 
The response provides minimal support/evidence for the writer's claim that includes little or no use of 
sources, facts, and details: 

2 ~=use of evidence from sources is minimal, absent, incorrect, or irrelevant 
The response expression of ideas is vague, lacks clarity, or is confusing: 

i uses limited language or domain-specific vocabulary 

© may have little sense of audience and purpose 
The response demonstrates a partial command of conventions: 

2 errors in usage may obscure meaning 

i inconsistent use of punctuation, capitalization, and spelling 
O Point Text: 


A response gets no credit if it provides no evidence of the ability to structure and write an essay. 


The student must use information in the provided documents to write an essay explaining what 
43703 | constellations can tell you about different cultures. A full-credit response uses proper organization, style, 


and vocabulary. 


95 


Prompt 
and 
Scoring 


Rubric 
for item 
43703 


Prompt: You have 70 minutes to review your sources, plan, draft, and revise your essay. You may refer to 
the sources. Read your assignment and the information about how your essay will be scored; then begin 
your work. 


Your Assignment 


Your Class is studying a unit on the stars. Your teacher has asked you to write an essay based on your 
research. 


Write an essay explaining what the names of the constellations can tell you about the people who named 
them and the attitudes people have had about the stars through the years. Include details from the 
sources in your essay. You do not need to use all the sources, only the ones that most effectively support 
the main ideas in your explanatory essay. 


Essay Scoring 
Your explanatory essay will be scored on the following criteria: 


1. Statement of purpose / focus and organization—How well did you clearly state your main idea? How 
well did your ideas logically flow from the introduction to conclusion using effective transitions? How well 
did you stay on topic throughout the essay? 

2. Elaboration of evidence—How well did you provide evidence from the sources to Support your main 
ideas? How well did you elaborate with specific information from the sources you reviewed? How well did 
you effectively express ideas using precise language that was appropriate for your audience and 
purpose? 

3. Conventions—How well did you follow the rules of usage, punctuation, capitalization, and spelling? 


Now begin work on your essay. Manage your time carefully so that you can: 
@® plan your essay 
@® write your essay 
@® revise and edit for a final draft 


Type your response in the space provided. Write as much as you need to fulfill the requirements of the 
task; you are not limited by the size of the response area on the screen. 


4 Point Text: 
Statement of Purpose/Focus and Organization 
The response is fully sustained, and consistently and purposefully focused: 
® controlling idea or main idea of a topic is clearly stated, focused, and strongly maintained 
® controlling idea or main idea of a topic is introduced and communicated clearly within the 
purpose, audience, and task 


The response has a clear and effective organizational structure creating a sense of unity and 
completeness: 
® consistent use of a variety of transitional strategies to clarify the relationships between and 
among ideas. 
® logical progression of ideas from beginning to end 


96 


® effective introduction and conclusion for audience and purpose 
@® strong connections among ideas, with some syntactic variety 


Evidence/Elaboration 
The response provides thorough and convincing support/evidence for the controlling idea or main idea 
that includes the effective use of sources, facts, and details. The response achieves substantial depth 
that is specific and relevant: 

@® use of evidence from sources is integrated, comprehensive, relevant, and concrete 

® effective use of a variety of elaborative techniques 


The response clearly and effectively expresses ideas, using precise language: 
® use of academic and domain-specific vocabulary is clearly appropriate for the audience and 
purpose 


3 Point Text: 
Statement of Purpose/Focus and Organization 
The response is adequately sustained and generally focused: 
® controlling idea or main idea of a topic is clear mostly maintained, though some loosely related 
material may be present 
® some context for the controlling idea or main idea of the topic is adequate within the purpose, 
audience, and task 


The response has an evident organizational structure and a sense of completeness, though there may 
be minor flaws and some ideas may be loosely connected: 

@® adequate use of transitional strategies with some variety between and among ideas 

® adequate progression of ideas from beginning to end 

@® adequate introduction and conclusion 

® adequate, if slightly inconsistent, connection among ideas 


Evidence/Elaboration 
The response provides adequate support/evidence for the controlling idea or main idea that includes the 
use of sources, facts, and details: 

@® some evidence from sources is included, though citations may be general or imprecise 

@® adequate use of some elaborative techniques 


The response adequately expresses ideas, employing a mix of precise with more general language: 
® use of domain-specific vocabulary is generally appropriate for the audience and purpose 


2 Point Text: 

Statement of Purpose/Focus and Organization 

The response is somewhat sustained and may have a minor drift in focus: 
® may be clearly focused on the controlling or main idea but is insufficiently sustained, or 
® controlling idea or main idea may be unclear and/or somewhat unfocused 


The response has an inconsistent organizational structure, and flaws are evident: 


® inconsistent use of transitional strategies and/or little variety 
@® uneven progression of ideas from beginning to end 


97 


® conclusion and introduction, if present, are weak 
@® weak connection among ideas 


Evidence/Elaboration 
The response provides uneven, cursory support/evidence for the controlling idea or main idea that 
includes partial or uneven use of sources, facts, and details. The response achieves little depth: 

@® evidence from sources is weakly integrated, and citations, if present, are uneven 

@® weak or uneven use of elaborative techniques 


The response expresses ideas unevenly, using simplistic language: 
® use of domain-specific vocabulary may at times be inappropriate for the audience and purpose 


Conventions 
The response demonstrates an adequate command of conventions: 
® errors in usage and sentence formation are present, but no systematic pattern of errors is 
displayed and meaning is not obscured 
@® adequate use of punctuation, capitalization, and spelling 


1 Point Text: 

Statement of Purpose/Focus and Organization 

The response may be related to the topic but may provide little or no focus: 
@® may be very brief 
® may have a major drift 
@® focus may be confusing or ambiguous 


The response has little or no discernible organizational structure: 
® fewornotransitional strategies are evident 
® frequent extraneous ideas may intrude 


Evidence/Elaboration 
The response provides minimal support/evidence for the controlling idea or main idea that includes little 
or no use of sources, facts, and details: 

® use of evidence from sources is minimal, absent, incorrect, or irrelevant 


The response's expression of ideas is vague, lacks clarity, or is confusing: 
® uses limited language or domain-specific vocabulary 
® may have little sense of audience and purpose 

Conventions 

The response demonstrates a lack of command of conventions: 
® errors are frequent and severe and meaning is often obscure 


O Point Text: 


Insufficient, illegible, in a language other than English, incoherent, off-topic, or off-purpose writing 


98 


Appendix C 


ltem Analysis Results by Grade and Form 


Grade 4 Mathematics 


Table A1. Grade 4 Mathematics: Item Analysis Summary Statistics 


ee ltem 
Difficulty Yel diantiarcieceyal 


Table A2. Grade 4 Mathematics: Distribution of DIF Item Categorizations for Selected Groups 


LEP mdatlelis Hispanic 
VS. VS. VS. 
non-LEP Male White 


es Number of Students 
Focal | ast 
[Reference [94 


Grade 7 Mathematics 


Table B1. Grade 7 Mathematics: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel diantiarcieceyal 


Hispanic 
VS. 
alice 


100 


Grade 11 Mathematics 


Table C1. Grade 11 Mathematics: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel diantiarcieceyal 


Hispanic 
VS. 
alice 


Number of Students 
107 


Number of Items 


a 

ee ee 
a 
re 
ee ee 
ee ee 


101 


Grade 4 Reading 


Table D1. Grade 4 Reading: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel diantiarcieceyal 


Number of Items 


102 


Grade 7 Reading 


Table E1. Grade 7 Reading: Item Analysis Summary Statistics 


ee ltem 
Difficulty DYSoldianliarcieceyal 


Table E2. Grade 7 Reading: Distribution of DIF Item Categorizations for Selected Groups 


Female Hispanic 
Vs. vs. 
Male White 


Number of Students 
Focal 33 


O 


A 
A 


C+ 
B+ 

+ 
C- 


103 


Grade 11 Reading 


Table F1. Grade 11 Reading: Item Analysis Summary Statistics 


lee ltem 
Difficulty DYSoldiantiarcieceyal 


Hispanic 
VS. 
alice 


104 


Grade 4 Writing Form A 


Table G1. Grade 4 Writing: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel dianliarcieceyal 


Number of Items 


105 


Grade 4 Writing Form B 


Table H1. Grade 4 Writing: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel diantiarcieceyal 


Number of Items 


106 


Grade 4 Writing Form C 


Table I1. Grade 4 writing: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel diantiarcieceyal 


Number of Items 


107 


Grade 7 Writing Form A 


Table J1. Grade 7 Writing: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel diantiarcieceyal 


N=698 


Table J2. Grade 7 Writing: Distribution of DIF Item Categorizations for Selected Groups 


Female Hispanic 
Vs. vs. 
Male White 


Number of Students 
Focal 33 
5 


9 
8 


3 
A 
11 


A 
A 


C+ 
B+ 

+ 
C- 


108 


Grade 7 Writing Form B 


Table K1. Grade 7 Writing: Item Analysis Summary Statistics 


Item Item 


N=703 Difficulty Discrimination 


Table K2. Grade 7 Writing: Distribution of DIF Item Categorizations for Selected Groups 


Female Hispanic 
Vs. vs. 
Male White 


Number of Students 


Focal 3 


6 
z 


1 
2 


A 
A 


C+ 
B+ 

+ 
C- 


109 


Grade 7 Writing Form C 


Table L1. Grade 7 Writing: Item Analysis Summary Statistics 


ae ltem 
Difficulty DYSol diantiarcieceyal 


Table L2. Grade 7 Writing: Distribution of DIF Item Categorizations for Selected Groups 


m=) aatel(s, 
vs. 
Male 


P| Nimberof Students 
[Humber oem 


110 


Grade 11 Writing Form A 


Table M1. Grade 11 Writing: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel diantiarcieceyal 


Table M2. Grade 11 Writing: Distribution of DIF Item Categorizations for Selected Groups 


Female Hispanic 
Vs. vs. 
Male White 


Number of Students 


Focal 30 
6 


9 
A 


1 
2 


A 
A 


1 
3 
2 


C+ 
B+ 

+ 
C- 


ae ee 


111 


Grade 11 Writing Form B 


Table N1. Grade 11 Writing: Item Analysis Summary Statistics 


ae ltem 
Difficulty Yel diantiarcieceya’ 


N=573 


Table N2. Grade 11 Writing: Distribution of DIF item Categorizations for Selected Groups 


Female Hispanic 
Vs. vs. 
Male White 


Number of Students 


Focal 285 


A 
A 


2 


C+ 
B+ 

+ 
C- 


112 


Grade 11 Writing Form C 


Table O01. Grade 11 Writing: Item Analysis Summary Statistics 


ae lee 
Difficulty Yel diantiarcieceyal 


Hispanic 
VS. 
alice 


113 


Appendix D 


Agreement Tables for Multiple Dimension Items 


Grade 4 43504 
ORGANIZATION 
First Human Scorer by Automated Score 
First Human Automated Score 
Scorer 
raat | 268 | a8, | 
Grade 4 43504 
ORGANIZATION 
First Human Scorer by Second Human Scorer 
First Human Scorer Second Human Scorer 
1 es 
2 
Grade 4 43504 First Human Scorer by Automated Score 
ELABORATION 


First Human Scorer mee Score 


114 


Grade 4 


Grade 4 


43504 
ELABORATION 


43504 
CONVENTIONS 


First Human Scorer by Second Human Scorer 


First Human Scorer ssa Human Scorer 
A ee 


First Human Scorer by Automated Score 


First Human Scorer ee Score 


>. 
a 
ee 


115 


Grade 4 43334 
ORGANIZATION 
Grade 4 43334 
ORGANIZATION 
Grade 4 43334 
ELABORATION 
Grade 4 43334 
ELABORATION 


First Human Scorer by Automated Score 


First Human Scorer 
23 [a rota 
ae a 

2 


2 fo] 233 
pis] oO 234 
Em 


4 og 


First Human Scorer by Second Human Scorer 


First Human Scorer 


4 off ot 


First Human Scorer by Automated Score 


First Human 
Scorer 


Ca 2 3] 4 | Total 


U2 ee 


RE 


3 of ee 


First Human Scorer by Second Human Scorer 


First Human 
scorer 
1 
2 


Second Human Scorer 


(Second Human Scorer 


O[ 226 


a 


116 


Grade 4 43334 


CONVENTIONS 
Grade 4 

43284 

ORGANIZATION 
Grade 4 43284 

ORGANIZATION 


First Human Scorer by Automated Score 


First Human Scorer 
Oi | 2 | Total 
re 


First Human Scorer by Automated Score 


First Human Scorer 


1 


4 oo a 


First Human Scorer by Second Human Scorer 


First Human Scorer 
a 213 [4] Total 


ee 


117 


43284 


Grade 4 ELABORATION 

Grade 4 43284 
ELABORATION 

Grade 4 43284 
CONVENTIONS 


First Human Scorer by Automated Score 


First Human Scorer 
a 21 3 [4 Total 
Co 


1 


ee se 


First Human Scorer by Second Human Scorer 


First Human Score 
Pofoy 23 


——— ro 432 
ee 


ee 


First Human Scorer by Automated Score 


First Human Scorer Automated Score 


Soy a | 2 | Total 


2 
15d 


a a5 


118 


Grade 7 


Grade 7 


Grade 7 


43438 
ORGANIZATION 


43438 
ORGANIZATION 


43438 
ELABORATION 


First Human Scorer by Automated Score 


First Human Scorer 
eee a 8 Toa 
of 79 
2 


a ee ee 
a a ee 


First Human Scorer by Second Human Scorer 


First Human Scorer 


First Human Scorer by Automated Score 


First Human Scorer 
/ Oo} 81 


: 


ee ee 
EE 


119 


Grade 7 43438 


ELABORATION 
Grade 7 43438 

CONVENTIONS 
Grade 7 43703 

ORGANIZATION 


First Human Scorer by Second Human Scorer 


First Human Scorer 
oy st 


2 


DE 


First Human Scorer by Automated Score 


First Human Scorer 
ofa | 2 | Total 


First Human Scorer by Automated Score 


First Human Scorer 
i ee 
of 210 


2 


ee 
oat tos 6 [| a 


120 


Grade 7 


Grade 7 


Grade 7 


43703 
ORGANIZATION 


43703 
ELABORATION 


43703 
ELABORATION 


First Human Scorer by Second Human Scorer 


First Human Scorer 
Pod | 2 | 8 | Total 

ee eo 
Ay 61 


3 


First Human Scorer by Automated Score 


First Human Scorer 
oy a2 
ee ee 
[Total oes] tf ae 


First Human Scorer by Second Human Scorer 


First Human Scorer Second Human Scorer 
a 
ee ee 


121 


Grade 7 


Grade 7 


Grade 7 


43703 
CONVENTIONS 


43469 
ORGANIZATION 


43469 
ORGANIZATION 


First Human Scorer by Automated Score 


First Human Scorer 


Soy a | 2 | Total 
Poy 6s. 
51 


x3 
Total af 09 [a3 | are 


miles lelpatcle 


First Human Scorer 


Tore) =) OM NU ING) Aaa 1K\0 melee) «= 


Popo] 7s 
oy 3s 
a) 


First Human Scorer by Second Human Scorer 


First Human Scorer 


Second Human Scorer 


BS al bial sia 
ae cm wttt se 


122 


Grade 7 43469 


ELABORATION 
Grade 7 43469 

ELABORATION 
Grade 7 43469 

CONVENTIONS 


First Human Scorer by Automated Score 


First Human Scorer 

a] 2 3/4 | Tota 
Popoy at 
oy 56, 
ee 


4 


First Human Scorer by Second Human Scorer 


First Human Scorer 
Co 


4 off og 


First Human Scorer by Automated Score 


First Human Scorer 
So] a | 2 | Total 

86 
Total [6 | 20 [8 


123 


Grade 11 43632 First Human Scorer by Automated Score 


ORGANIZATION 
First Human Scorer Automated Score 
2 
= 
a 
Grade 11 43632 
ORGANIZATION First Human Scorer by Second Human Scorer 
First Human Scorer Second Human Scorer 
i [of 38 
2 
a a 
Grade 11 43632 
ELABORATION First Human Scorer by Automated Score 


2 isis 
SS 
+ 


; 


ss 
__* 
4 


124 


Grade 11 


Grade 11 


Grade 11 


43632 
ELABORATION 


43632 
CONVENTIONS 


43635 
ORGANIZATION 


First Human Scorer by Second Human Scorer 


1 
2 


ee ee Te 


First Human Scorer 


oO 


al 
19 


First Human Scorer by Automated Score 


First Human Scorer 


Automated Score 


First Human Scorer by Automated Score 


oy as 


First Human Scorer 


= 
| 


125 


Grade 11 43635 


ORGANIZATION 
Grade 11 43635 

ELABORATION 
Grade 11 43635 

ELABORATION 


First Human Scorer by Second Human Scorer 


First Human Scorer Second Human Scorer 


of 


ee 


First Human Scorer by Automated Score 


First Human Scorer 
ee ee i6 


First Human Scorer by Second Human Scorer 


First Human Scorer 
oy as 
a6 


126 


Grade 11 


Grade 11 


Grade 11 


43635 
CONVENTIONS 


43479 
ORGANIZATION 


43479 
ORGANIZATION 


First Human Scorer by Automated Score 


First Human Scorer 
ee ee 
i 

24 
5 


ee ee 


First Human Scorer by Automated Score 


First Human Scorer 


tal 


First Human Scorer by Second Human Scorer 


First Human Scorer 
ope 
38 


38 


er 
toate ef 


Id 


Grade 11 


Grade 11 


Grade 11 


43479 
ELABORATION 


43479 
ELABORATION 


43479 
CONVENTIONS 


First Human Scorer by Automated Score 


First Human Scorer 
oy 28 
a 


2 


ee —s 
tot fo 


First Human Scorer by Second Human Scorer 


First Human Scorer 
oy 28 
2 


2 


ee = 
Dd 


First Human Scorer by Automated Score 


First Human Score 
Pot Total 
ED 
28 


>. 


31 
etal 


128 


Grade 11 43479 
CONVENTIONS First Human Scorer by Second Human Scorer 


First Human Scorer 
oT a | 2 | Total 
ee Co 


al 
10 


ee 
totals |e 


129 


Grade 4 MATH 


Grade 4 MATH 


Grade 4 MATH 


Appendix E 


Agreement Tables for Propositional Model 


43572 
First Human Scorer by Automated Score 
Automated Score 
43572 
43564 


First Human Scorer by Automated Score 


Automated Score 


Sea] 


155 


130 


Grade 4 MATH 43564 First Human Scorer by Second Human Scorer 


Second Human Scorer 


Grade 4 MATH 43173 
First Human Scorer by Automated Score 


Automated Score 


First Human Scorer 


! 


Grade 4 MATH 43173 
First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human Scorer 


Grade 7 MATH 43551 
First Human Scorer by Automated Score 


Automated Score 


First Human Scorer 
Pol sf at 
ec 


Total 176 32 208 


Grade 7 MATH 43551 


First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human Scorer 


Grade 7 MATH 43555 
First Human Scorer by Automated Score 


Automated Score 


First Human Scorer 


Grade 7 MATH 43555 
First Human Scorer by Second Human Scorer 


Second Human Scorer 


Grade 7 MATH 43557 
First Human Scorer by Automated Score 


Automated Score 


First Human Scorer of af Tota 
=. 


132 


Grade 7 MATH 


Grade 7 MATH 


Grade 7 MATH 


Total 138 51 189 


43557 


First Human Scorer by Second Human Scorer 
Second Human Scorer 


First Human Scorer 
0} 2} Total 


43639 
First Human Scorer by Automated Score 


Automated Score 


First Human Scorer 


43639 
First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human Scorer 
EE 
Tatas 


133 


First Human Scorer by Automated Score 


Grade 7 MATH 43559 Automated Score 


Grade 7 MATH 43559 
First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human Scorer 
1 Total 


52 


p82 
ee 


Grade 7 MATH 43552 
First Human Scorer by Automated Score 


Automated Score 


First Human Scorer 
Total 


a) 


Grade 7 MATH 43552 


First Human Scorer by Second Human Scorer 


Second Human Scorer 


134 


Grade 4 READING 


Grade 4 READING 


Grade 4 READING 


43707 


43707 


43412 


First Human Scorer by Automated Score 
Automated Score 


First Human Scorer 
0} 4} 2] Toa 


__ © Js P| 304 
| 126 


113 


First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human 


| 304 


CO 
——— 


First Human Scorer by Automated Score 


Automated Score 


1 Total 


ee) eed 
ot er 
8 es 
oat _———=«dSe6 | tto] ts] eam 


First Human Scorer 


135 


Grade 4 READING 43412 
First Human Scorer by Second Human Scorer 


Second Human Scorer 


ta 
a 2A 


Grade 4 READING 43416 
First Human Scorer by Automated Score 
Automated Score 
4 READIN 4341 
ulate ‘ ne First Human Scorer by Second Human Scorer 
Second Human Scorer 
Grade 7 READING 43248 


First Human Scorer by Automated Score 


Automated Score 


136 


Grade 7 READING A3248 First Human Scorer by Second Human Scorer 


Second Human Scorer 


Grade 7 READING 43445 
First Human Scorer by Automated Score 
Automated Score 
First Human Scorer 
Total 
Grade 7 READING 43445 


First Human Scorer by Second Human Scorer 


Second Human Scorer 


137 


Grade 7 READING 


Grade 7 READING 


Grade 11 READING 


43422 


43422 


43297 


First Human Scorer by Automated Score 


Automated Score 


First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human Scorer by Automated Score 


Automated Score 


First Human Scorer 


138 


Grade 11 READING 43297 
First Human Scorer by Second Human Scorer 


Second Human Scorer 


Grade 11 READING 43435 
First Human Scorer by Automated Score 
Automated Score 

Grade 11 READING 43435 


First Human Scorer by Second Human Scorer 


First Second Human Scorer 


Human 
Scorer 


139 


Grade 11 READING 43397 First Human Scorer by Automated Score 


Automated Score 


Grade 11 READING 43397 


First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human Scorer 
1 2 Total 


Grade 4 WRITING 43280 
Form C First Human Scorer by Automated Score 


Automated Score 


First Human Scorer 
1 2 Total 


OO 
ee e) 


140 


Grade 4 WRITING 
Form C 


Grade 7 WRITING 
Form C 


Grade 7 WRITING 
Form C 


Grade 11 WRITING 
Form C 


43280 


43468 


43468 


43491 


First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human Scorer by Automated Score 


Automated Score 


First Human Scorer by Second Human Scorer 


Second Human Scorer 


First Human Scorer by Automated Score 


Automated Score 


141 


Grade 11 WRITING 43491 
Form C First Human Scorer by Second Human Scorer 


Second Human Scorer 


142 


Appendix!— Adaptive Item Selection Report 


Page 27 of 39 


Smarter Balanced Adaptive 
Item Selection Algorithm 
Design Report 


Preview Release 16 May 2014 


Jon Cohen and Larry Albright 
American Institutes for Research 


VW 


AMERICAN INSTITUTES FOR RESEARCH” 


Smarter 
Balanced 


Assessment Consortium 


Produced for Smarter Balanced by the American 
Institutes for Research 


This work is licensed under a Creative Commons 
Attribution-NoDerivatives 4.0 International License. 


Smarter Balanced Adaptive Item Selection Algorithm 


TABLE OF CONTENTS 

1. INTRODUCTION, BACKGROUND, AND DEFINITIONS... eesssseeeeeeeeeeeeeeeeees l 
Tig Ua TMD POTTING aaenincale verse ots ila aed acesna avalos Teac esieiovn ole catvaucrealem se Sedge aeteweeds 2 

| 05 1101 a2 eee me a rer So en pe a Uo oe RE eee ee 3 
PZ. “Content ¥ alte 10t Sine le MemiS2cscteeteidceueseisatirinenaad ileus eareadetinn 3 

22” +COMment, V alice OF SEIS Ol eINSr icin pies denisedin ved cceuitme tek atantleden due opeutencecen: 4 

PS TatOnManiOme V AIG fe zccess: ciesksnsaconndeecauaesecdeeuedscoanay eacagonssnecauan eascuuancsimeawae wedges esa Geo 6 
bok. Tndtvidual Information, V ale <ccctyne a tisiehentiecieeianiciene a denninctindieniaditueea doce 6 

|G ama 1021 0g ES) 0 hc ee ree ar ac ene ee ee a ao PO rer ee Pn a OP ere ae 6 

VDD! POVVCOM IOUS LCI iggy cdi sie Satis dace ssto sic ¥a aciee dgs esa vbetis dei Uocsislo ic bad aces doy etesin ek hada tose rte batt 6 

L342: term Group: MiiOrmalon: V ANC satacssecspatcntscensesshadiistdacucssecauavesiaceusestadaveelaceeetaeeaniaene 9 

2, TeNERY a ND AINT ATI ZATION g ccckscssieavasiceatavesevaceaed aati ieeawate doatavetauauatataadiieeesetecentenaa: 9 
2k Meta OO Dic le CUO sg Gaaste teak auto iaat oleate nee ied tee eas 9 

AP NG | WLS) Med 200 Lo) 11 el Bc) 01 240 5 ee een ne ee Pe es 9 

2,5 Initiahzationo! Starting ‘Theta Estates .2esiciaieesced diese sineatnberte caged iowa reece 10 

2.4 Insertion of Embedded Field-Test [tems.............. cc cecccccccccccccceeeseeeececceeeeeaeeeeeeeeeeeeeeaas 10 

Be. ATE VisSEUEC PION ccctitcedraes se lus tan cds edds aed lah bia dead sat elated aa ned sant 1] 
51 Priming tae: Custom: [em POO Mice aescicacteriescoaticeniesvauestivadter hes thonteanie teauescieauter. 12 

De IRC OY CIN 4 VOOM LIM soecarecictecn one atergenceatch one ten eave coin hes sise ee aia wraps enka cen hes eae 13 

325 PACAP Uy Se MCI CIC CU OM iyi sods ox ot akna sa eianeduaveueenladtosiebueknayaarsene ante aos 13 

S64 DClCCUOMOl (NS MI tal NS iis oi.sas.c cate sudissasueadadeGenbseiagseta cee deuuhetaseedediGoutscaacsetucente aubcaaesendeniee 14 

9 EE XPOSUt SC OMUOL es airousidiee aeindoeatterhalineudacueiioe hen dartetatod die danaurtn sin dameummiediadabeundes [2 

A — WEE BEI TTIN ACT TOIN cece ire secs cates cesth we dos useseseasaeyuessntcastrs sash stodosaaaaeaaeaneaeneeews cash stedo-canemaiceueseiecceess 14 
Al. DEFINITIONS OF USER-SETTABLE PARAMETERS. ..........ccccccccccccccecceeeeeeeeeeeeeeeeeeees 15 
PD NPN arses haadeansamaaieadesenuisausialsundtakaad sausamaainaded anseausaalsunadakadieausamadeades ass saussaluctddscdanausanisendesaassausiatunttee: 16 
AS; SUPPORTING DATA: STRUCTURES 2iitccurai iiiehe ae tral bie icat el ic lai, 16 


i American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


SMARTER BALANCED ADAPTIVE ITEM SELECTION ALGORITHM 


1. INTRODUCTION, BACKGROUND, AND DEFINITIONS 


This document describes the Smarter Balanced adaptive item selection algorithm. The item 
selection algorithm is designed to cover a standards-based blueprint, which may include content, 
cognitive complexity, and item type constraints. The item selection algorithm will also include: 


# the ability to customize an item pool based on access constraints and screen items that 
have been previously viewed or may not be accessible for a given individual; 

=" amechanism for inserting embedded field-test items; and 

= amechanism for delivering “segmented” tests in which separate parts of the test are 
administered in a fixed order. 


This document describes the algorithm and the design for its implementation for the Smarter 
Balanced Test Delivery System. The implementation builds extensively on the algorithm 
implemented in AIR’s Test Delivery System. The implementation described 1s released under a 
Creative Commons Attribution, No Derivatives license. 


The general approach described here is based on a highly parameterized multiple-objective 
utility function. The objective function includes: 


=" ameasure of content match to the blueprint; 
= ameasure of overall test information; and 


=" measures of test information for each reporting category on the test. 


We define an objective function that measures an item’s contribution to each of these objectives, 
weighting them to achieve the desired balance among them. Equation | sketches this objective 
function for a single item. 


l R K 
hi = WO ae > Sipe. ip Wd Otay (Veiner te) + Woh (U;.U isto) (1) 


r=l k=1 
D4, 
r=l 


where the terms w represent user-supplied weights that assign relative importance to meeting 
each of the objectives, d,, indicates whether item / has the blueprint-specified feature r, and p, is 


the user-supplied priority weight for feature r. The term s,;; 1s an adaptive control parameter that 
is described below. In general, s,;, increases for features that have not met their designated 
minimum as the end of the test approaches. 


The remainder of the terms represents an item’s contribution to measurement precision: 


Ve 1S the value of item / toward reducing the measurement error for reporting category k 


for examinee 7 at selection ¢; and 


1 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


= u,, 18 the value of item 7 1n terms of reducing the overall measurement error for examinee 


i at selection t¢. 


The terms U;, and V;;, represent the total information overall and on reporting category k, 
respectively. 


The term qd, 18 a user-supplied priority weight associated with the precision of the score estimate 
for reporting category k. The terms ¢ represent precision targets for the overall score (t 9) and 
each score reporting category score. The functions /(.) are given by: 


AU jt if Ut < to 


ho(wijt» Vie to) = 7 jt otherwise 


ce Vi ¢ _ Plt if Veit = Cx 

ae diVpi jt otherwise 

Items can be selected to maximize the value of this function. This objective function can be 
manipulated to produce a pure, standards-free adaptive algorithm by setting w, to zero or a 
completely blueprint-driven test by setting w, = Ww, = 0. Adjusting the weights to optimize 
performance for a given item pool will enable users to maximize information subject to the 
constraint that the blueprint is virtually always met. 


We note that the computations of the content values and information values generate values on 
very different scales and that the scale of the content value varies as the test progresses. 
Therefore, we normalize both the information and content values before computing the value of 


1 if min = max 
Equation 1. This normalization is given by x = 4 v-—min ._., Where min and max 
aa ee otherwise 


represent the minimum and maximum, respectively, of the metric computed over the current set 
of items or item groups. 


The remainder of this section describes the overall program flow, the form of the blueprint, and 
the various value calculations employed in the objective function. Subsequent sections describe 
the details of the selection algorithm. 


1.1 Blueprint 


Each test will be described by a single blueprint for each segment of the test and will identify the 
order in which the segments appear. The blueprint will include: 


=" an indicator of whether the test is adaptive or fixed form; 
= termination conditions for the segment, which are described in a subsequent section; 
=" aset of nested content constraints, each of which 1s expressed as: 

— the minimum number of items to be administered within the content category; 


— the maximum number of items to be administered within the content category; 


2 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


— an indication of whether the maximum should be deterministically enforced (a 
“strict” maximum); 


— apriority weight for the content category p,; 
— an explicit indicator as to whether this content category is a reporting category; and 


— an explicit precision-priority weight (q;,) for each group identified as a reporting 
category. 


=" aset of non-nested content constraints, which are represented as: 
— aname for the collection of 1tems meeting the constraint; 
— the minimum number of items to be administered from this group of items; 
— the maximum number of items to be administered from this group of items; 


— an indication of whether the maximum should be deterministically enforced (a 
“strict” maximum); 


— apriority weight for the group of items p,; 


— anexplicit indicator as to whether this named group will make up a reporting 
category; and 


— anexplicit precision-priority weight (q;) for each group identified as a reporting 
category. 


— The priority weights, p, on the blueprint, can be used to express values in the 
blueprint match. Large weights on reporting categories paired with low (or zero) 
weights on the content categories below them may allow more flexibility to maximize 
information in a content category covering fewer fine-grained targets, while the 
reverse would mitigate toward more reliable coverage of finer-grained categories, 
with less content flexibility within reporting categories. 


An example of a blueprint specification appears in Appendix 1. 


Each segment of a test will have a separate blueprint. 


1.2 Content Value 


Each item or item group will be characterized by its contribution to meeting the blueprint, given 
the items that have already been administered at any point. The contribution is based on the 
presence or absence of features specified in the blueprint and denoted by the term d in 
Equation |. This section describes the computation of the content value. 


1.2.1 Content Value for Single Items 
For each constraint appearing in the blueprint (7), an item 7 either does or does not have the 


characteristic described by the constraint. For example, a constraint might require a minimum of 
four and a maximum of six algebra items. An item measuring algebra has the described 


3 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


characteristic, and an item measuring geometry but algebra does not. To capture this constraint, 
we define the following: 


» dz, is a feature vector in which the elements are d;,, summarizing item i’s contribution to 
meeting the blueprint. This feature vector includes content categories such as claims and 
targets as well as other features of the blueprint, such as Depth of Knowledge and item 


type. 


s §6Sj_ 18 a diagonal matrix, the diagonal elements of which are the adaptive control 
parameters S,j;-. 


= pis the vector containing the user-supplied priority weights p,. 


The scalar content value for an item is given by Cj, = d, Sip. 


Letting Z,;, represent the number of items with feature r administered to student i by iteration ¢ 
the value of the adaptive control parameters 1s: 


m(2 — i) if z, < Min, 


Min, 
z.—Min ; 
a=) 1-—“"—— if Min, <z,,, < Max, 
Max. — Min, 
(Max, 7 Z,4)—1 if Max, < 2 vit 


The blueprint defines the minimum (Min,) and maximum (Max,) number of items to be 
administered with each characteristic (7). 


T ; 
The term m,, = aa T is the total test length. This has the effect of increasing the 


algorithm’s preference for items that have not yet met their minimums as the end of the test nears 
and the opportunities to meet the minimum diminish. 


This increases the likelihood of selecting items for content that has not met its minimum as the 
opportunities to do so are used up. The value s is highest for items with content that has not met 
its minimum, declines for items representing content for which the minimum number of items 
has been reached but the maximum has not, and turns negative for items representing content 
that has met the maximum. 


1.2.2 Content Value for Sets of Items 


Calculation of the content value of sets of items 1s complicated by two factors: 


1. The desire to allow more items to be developed for each set and to have the most 
advantageous set of items administered 


4 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


2. The design objective of characterizing the information contribution of a set of items as 
the expected information over the working theta distribution for the examinee 


The former objective is believed to enhance the ability to satisfy highly constrained blueprints 
while still adapting to obtain good measurement for a broad range of students. The latter arises 
from the recognition that ELA tests will select one set of items at a time, without an opportunity 
to adapt once the passage has been selected. 


The general approach involves successive selection of the highest content value item in the set 
until the indicated number of items in the set have been selected. Because the content value of an 
item changes with each selection, a temporary copy of the already-administered content vector 
for the examinee 1s updated with each selection such that subsequent selections reflect the items 
selected in previous iterations. 


Exhibit 1 presents a flowchart for this calculation. Readers will note the check to determine 
whether wo> 0 or w, > 0. These weights, defined with Equation 1, identify the user-supplied 
importance of information optimization relative to blueprint optimization. In cases such as 
independent field tests, this weight may be set to zero, as it may not be desirable to make item 
administration dependent on match to student performance. In more typical adaptive cases where 
item statistics will not be recalculated, favoring more informative items is generally better. The 
final measure of content value for the set of selected set of items is divided by the number of 
items selected to avoid a bias toward selection of sets with more items. 


Initialize 
Content Value = 0 
Create working copy 
content status vector 
Eliminate any item set : : . 
y Tie for highest Select highest 

members that would . 

; : ; value? value item 
violate a strict maximum 


Yes 
Yes 
Initialize i=0 
No 


Wo> 0 or w,>0 
Calculate content value of 
each item Select randomly 


from among ties 


Update working copy Calculate content value of 


content status vector each item 
Tie for highest 
ContentValue/i 
Add value of selected Select highest 
item to ContentValue value item with 


highest 
i=number to 
administer? Increment i 


information 


American Institutes for Research 


Nn 


Smarter Balanced Adaptive Item Selection Algorithm 


Exhibit 1. Content Value Calculation for Item Sets 


1.3 Information Value 


Each item or item group also has value in terms of maximizing information, both overall and on 
reporting categories. 


1.3.1 Individual Information Value 


The information value associated with an item will be an approximation of information. The 
system will be designed to use generalized IRT models; however, it will treat all 1tems as though 
they offer equal measurement precision. This is the assumption made by the Rasch model, but in 
more general models, items known to offer better measurement are given preference by many 
algorithms. Subsequent algorithms are then required to control the exposure of the items that 
measure best. Ignoring the differences in slopes serves to eliminate this bias and help equalize 
exposure. 


1.3.2 Binary Items 


The approximate information value of a binary item will be characterized as [;(@) = p;(@)(1 — 
p;(@)), where the slope parameters are artificially replaced with a constant. 


1.3.3 Polytomous Items 


In terms of information, the best polytomous item in the pool is the one that maximizes the 
expected information, [;(@). Formally, [;(0) > 1,(@) for all items k #j. The true value 0, 


however, remains unknown and is accessed only through an estimate, @~N(6, og). By definition 
of an expectation, the expected information [;(@) = J L(tf (t|6, Og) dt. 


The intuition behind this result 1s illustrated in Exhibit 2. In Exhibit 2, each panel graphs the 
distribution of the estimate of 6 for an examinee. The top panel assumes a polytomous item in 
which one step threshold (Al) matches the mean of the @ estimate distribution. In the bottom 
panel, neither step threshold matches the mean of the 6 estimate distribution. The shaded area in 
each panel indicates the region in which the hypothetical item depicted in the panel provides 
more information. We see that approximately 2/3 of the probability density function is shaded in 
the lower panel, while the item depicted in the upper panel dominates in only about 1/3 of the 
cases. In this example, the item depicted in the lower panel has a much greater probability of 
maximizing the information from the item, despite the fact that the item in the upper panel has a 
threshold exactly matching the mean of the estimate distribution and the item in the lower panel 
does not. 


6 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


Threshold A1 matches the best 
current estimate of the 
proficiency for this student, but 
the estimate is not yet very 
precise 


Neither threshold matches the 
best current estimate of the 
proficiency for this student, but 
together they cover more of the 
proficiency distribution 


Exhibit 2. Two example items, with the shaded region showing the probability that the item maximizes 
information for the examinee depicted. 


Exhibit 3 shows what happens to information as the estimate of this student’s proficiency becomes more 
precise (later in the test). In this case, the item depicted in the top panel maximizes information about 65- 
70 percent of the time, compared to about 30 to 35 percent for the item depicted in the lower panel. These 
are the same items depicted in the Exhibit 2, but in this case we are considering information for a student 
with a more precise current proficiency estimate. 


/ American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


When the proficiency estimate 
gets more precise, the item that 
best matches the center of the 
distribution covers most of it 


As the proficiency distribution 
becomes more narrow, the item 
that does not match the center 
provides less information 


Exhibit 3: Two example items, with the shaded region showing the probability that the item maximizes information for 
the examinee depicted. 


The approximate information value of polytomous items will be characterized as the expected 
information, specifically E|I;(@)|m,, s;| = fer © pj (kKlOb(t; m;, 5;)dt, where 1;,(t) 
represents the information at ¢t of response k to item j, p;(k|t) 1s the probability of response k to 
item j (artificially holding slope constant), given proficiency ¢t, @(.) represents the normal 
probability density function, and m; and s; represent the mean and standard deviation of 
examinee 7’s current estimated proficiency distribution. 


We propose to use Gauss-Hermite quadrature with a small number of quadrature points 
(approximately five). Experiments show that we can complete this calculation for 1,000 items in 
fewer than 5 milliseconds, making it computationally reasonable. 


As with the binary items, we propose to ignore the slope parameters to even exposure and avoid 
a bias toward the items with better measurement. 


8 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


1.3.4 Item Group Information Value 


Item groups differ from individual items in that a set of items will be selected for administration. 
Therefore, the goal is to maximize information across the working theta distribution. As with the 
polytomous items, we propose to use Gauss-Hermite quadrature to estimate the expected 
information of the item group. 


In the case of multiple-item groups 


Ig 
1 
E|Iy(@)|m;, s;| ==] Yn p(t; m,;,5;)dt 


Where J,(.) is the information from item group g, J,,;) 1s the information associated with 
itemj € g, for the J, items in set g. In the case of polytomous items, we use the expected 
information, as described above. 


2. ENTRY AND INITIALIZATION 


At startup, the system will 


" create a custom item pool; 
= initialize theta estimates for the overall score and each score point; and 


= insert embedded field-test items. 


2.1. Item Pool 


At test startup the system will generate a custom item pool, a string of item IDs for which the 
student is eligible. This item pool will include all items that 


" are active in the system at test startup; and 


= are not flagged as “access limited” for attributes associated with this student. 


The list will be stored in ascending order of ID. 


2.2 Adjust Segment Length 


Custom item pools run the risk of being unable to meet segment blueprint minimums. To address 
this special case, the algorithm will adjust the blueprint to be consistent with the custom item 
pool. This capability becomes necessary when an accommodated item pool systematically 
excludes some content. 


Let 


S' be the set of top-level content constraints in the hierarchical set of constraints, each 
consisting of the tuple (name, min, max, n); 


) American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


C be the custom item pool, each element consisting of a set of content constraints B; 
f, p integers represent item shortfall and pool count, respectively; and 
t be the minimum required items on the segment. 


For each s in S, compute n as the sum of active operational items in C classified on the 
constraint. 


f= summation over S (min — n) 
p = summation over S (n) 


ift—f<p, thent=t-—f 


2.3 Initialization of Starting Theta Estimates 


The user will supply five pieces of information in the test configuration: 

1. A default starting value if no other information is available 

2. An indication whether prior scores on the same test should be used, if available 
3. Optionally, the test ID of another test that can supply a starting value, along with 
4 


Slope and intercept parameters to adjust the scale of the value to transform it to the scale 
of the target test 


5. A constant prior variance for use in calculation of working EAP scores 


2.4 Insertion of Embedded Field-Test Items 


Each blueprint will specify 


= the number of field-test items to be administered on each test; 
# the first item position into which a field-test item may be inserted; and 


# the last 1tem position into which a field-test item may be inserted. 


Upon startup, select randomly from among the field-test items or item sets until the system has 
selected the specified number of field-test items. If the items are in sets, the sets will be 
administered as a complete set, and this may lead to more than the specified number of items 
administered. 


The probability of selection will be given by p j= et 


ta) Saas | 
p; represents the probability of selecting the item; 


m 1s the targeted number of field-test items; 


10 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


N; 1s the total number of active items in the field-test pool; 
kK; is the number of items in item set 7; and 


aj; iS a user-supplied weight associated with each item (or item set) to adjust the relative 
probability of selection. 


The a; variables are included to allow for operational cases in which some items must complete 
field-testing sooner, or enter field-testing later. While using this parameter presents some 
statistical risk, not doing so poses operational risks. 

For each item set, generate a uniform random number 7; on the interval {0,1}. Sort the items in 


ascending order by —. Sequentially select items, summing the number of items in the set. Sto 
g y D; q MA g p 


the selection of field-test items once FTNMin < m < FTNMax = ) j20 Kj. 


Next, each item is assigned to a position on the test. To do so, select a starting position within 
f — FTMax — FTMin positions from FTMin, where FTMax is the maximum allowable position 
for field-test items and /7Min is the minimum allowable position for field-test items. FTNMin 
and F7TNMax refer to the minimum and maximum number of field-test items, respectively. 
Distribute the items evenly within these positions. 


3. ITEM SELECTION 


Exhibit 3 summarizes the item selection process. If the 1tem position has been designated for a 
field-test item, administer that item. Otherwise, the adaptive algorithm kicks in. 


11 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


Yes | Administer 
ee ie aes field test 
field test item? . 6 
item or group 
No 
Implement recycling algorithm 
Eliminate all items that exceed strict max designations 


Calculate content values for all items and groups 


Sort in descending order of content value 
Cset1 =t a a 
set1 = to ; 
Scand P : Cseti1 = top cset1size 
csetlinitialsize 


Calculate information 
& total value for all 


members of cset1 


Sort in descending 

order 

Select randomly from Administer selected 
top cset2size item or group 


Exhibit 3: Summary of Item Selection Process 


This approach is a “content first” approach designed to optimize match to blueprint. An 
alternative, “information first” approach, is possible. Under an information first approach, all 
items within a specified information range would be selected as the first set of candidates, and 
subsequent selection within that set would be based, 1n part, on content considerations. The 
engine 1s being designed so that future development could build such an algorithm using many of 
the calculations already available. 


3.1. Trimming the Custom Item Pool 


At each item selection, the active item pool is modified 1n four steps: 


1. The custom item pool is intersected with the active item pool, resulting in a custom active 
item pool. 


2. Items already administered on this test are removed from the custom active item pool. 


3. Items that have been administered on prior tests are tentatively removed (see Section 3.2 
below). 


12 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


4. Items that measure content that has already exceeded a strict maximum are tentatively 
removed from the pool, removing entire sets containing items that meet this criterion. 


3.2 Recycling Algorithm 


When students are offered multiple opportunities to test, or when prior tests have been started 
and invalidated, students will have seen some of the items in the pool. The trimming of the item 
pool eliminates these items from the pool. It is possible that in such situations, the pool may no 
longer contain enough items to meet the blueprint. 


Hence, items that have been seen on previous administrations may be returned to the pool. If 
there are not enough items remaining in the pool, the algorithm will recycle items (or item 
groups) with the required characteristic that is found in insufficient numbers. Working from the 
least recently administered group, items (or item groups) are reintroduced into the pool until the 
number of items with the required characteristics meets the minimum requirement. When item 
groups are recycled, the entire group is recycled rather than an individual item. Items 
administered on the current test are never recycled. 


3.3 Adaptive Item Selection 


Selection of items will follow a common logic, whether the selection is for a single item or an 
item group. Item selection will proceed in the following three steps: 


1. Select Candidate Set 1 (cset/). 
a. Calculate the content value of each item or item group. 
b. Sort the item groups in descending order of content value. 
c. Select the top cset/size, a user-supplied value that may vary by test. 
2. Select Candidate Set 2 (cset2). 
a. Calculate the information values for each item group in cset/. 
b. Calculate the overall value of each item group in cset/ as defined in Equation 1. 
c. Sort cset2 in descending order of value. 


d. Select the top cset2size item groups, where cset2size is a user-supplied value that may 
vary by test. 


3. Select the item or item group to be administered. 
a. Select randomly from cset2 with uniform probability. 


Note that a “pure adaptive” test, without regard to content constraints, can be achieved by setting 
csetl size to the size of the item pool and wz, the weight associated meeting content constraints in 
Equation 1, to zero. Similarly, linear-on-the-fly tests can be constructed by setting wp, and w, to 
Zero. 


13 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


3.4 Selection of the Initial Item 


Selection of the initial item can affect item exposure. At the start of the test, all tests have no 
content already administered, so the items and item groups have the same content value for all 
examinees. In general, it is a good idea to spread the initial item selection over a wider range of 
content values. Therefore, we define an additional user-settable value, csetlinitialsize, which 1s 
the size of Candidate Set 1 on the first item only. Similarly, we define cset2initialisize. 


3.5 Exposure Control 


This algorithm uses randomization to control exposure and offers several parameters that can be 
adjusted to control the tradeoff between optimal item allocation and exposure control. The 
primary mechanism for controlling exposure is the random selection from CSET2, the set of 
items or item groups that best meet the content and information criteria. These represent the “top 
k” items, where & can be set. Larger values of k provide more exposure control at the expense of 
optional selection. 


In addition to this mechanism, we avoid a bias toward items with higher measurement precision 
by treating all items as though they measured with equal precision by ignoring variation in the 
slope parameter. This has the effect of randomizing over items with differing slope parameters. 
Without this step, 1t would be necessary to have other post hoc explicit controls to avoid the 
overexposure of items with higher slope parameters, an approach that could lead to different test 
characteristics over the course of the testing window. 


4. TERMINATION 


The algorithm will have configurable termination conditions. These may include 


= administering a minimum number of items in each reporting category and overall; 
=" achieving a target level of precision on the overall test score; 


= achieving a target level of precision on all reporting categories. 


We will define four user-defined flags indicating whether each of these 1s to be considered in the 
termination conditions (TermCount, TermOverall, TermReporting, TermTooClose). A fifth user- 
supplied value will indicate whether these are taken in conjunction or if satisfaction of any one of 
them will suffice (TermAnd). Reaching the minimum number of items is always a necessary 
condition for termination. 


In addition, two conditions will each individually and independently cause termination of the 
test: 


1. Administering the maximum number of items specified in the blueprint 
2. Having no items in the pool left to administer 


14 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


A1. DEFINITIONS OF USER-SETTABLE PARAMETERS 


This appendix summarizes the user-settable parameters in the adaptive algorithm. 


Parameter Name Description Entity ecei’ to by 
Subscript Index 
Wo Priority weight associated with match to blueprint N/A 
Wi Priority weight associated with reporting category information N/A 
W2 Priority weight associated with overall information N/A 
dk Priority weight associated with a specific reporting category reporting categories 
Dr Priority weight associated with a feature specified in the blueprint features specified in the 
(These inputs appear as a component of the blueprint.) blueprint 
a Parameter of the function h/(.) that controls the overall information weight | N/A 
when the information target has not yet been hit 
b Parameter of the function h/(.) that controls the overall information weight | N/A 
after the information target has been hit 
Cr Parameter of the function h/(.) that controls the information weight when | reporting categories 
the information target has not yet been hit for reporting category k 
dy Parameter of the function h/(.) that controls the information weight after | reporting categories 
the information target has been hit for reporting category k 
cset1size Size of candidate pool based on contribution to blueprint match N/A 
csettinitialsize Size of candidate pool based on contribution to blueprint match for the |N/A 
first item or item set selected 
cset2size size of final candidate pool from which to select randomly N/A 
cset2initialsize Size of candidate pool based on contribution to blueprint match and 
information for the first item or item set selected 
to Target information for the overall test N/A 
ty Target information for reporting categories reporting categories 
startTheta A default starting value if no other information is available N/A 
SstartPrevious An indication of whether previous scores on the same test should be N/A 
used, if available 
startOther The test ID of another test that can supply a starting value, along with |N/A 
startOtherSlope 
startOtherSlope Slope parameter to adjust the scale of the value to transform it to the N/A 
scale of the target test 
startOtherInt Intercept parameter to adjust the scale of the value to transform it to the |N/A 
scale of the target test 
FTMin Minimum position in which field-test items are allowed N/A 
FT Max Maximum position in which field-test items are allowed N/A 
FTNMin Target minimum number of field-test items N/A 
FTNMax Target maximum number of field-test items N/A 
aj Weight adjustment for individual embedded field-test items used to field-test items 


increase or decrease their probability of selection 


15 American Institutes for Research 


Smarter Balanced Adaptive Item Selection Algorithm 


Parameter Name Description Entity Referred to by 
Subscript Index 
AdaptiveCut The overall score cutscore, usually proficiency, used in consideration of 
TermTooClose 


TooCloseSEs The number of standard errors below which the difference is considered 
“too close” to the adaptive cut to proceed. In general, this will signal 


proceeding to a final segment that contains off-grade items. Ugh. 


TermOverall Flag indicating whether to use the overall information target as a N/A 
termination criterion 

TermReporting Flag to indicate whether to use reporting category information target as a| N/A 
termination criterion 

TermCount Flag to indicate whether to use minimum test size as a termination N/A 
condition 

TermTooClose Terminate if you are not sufficiently distant from the specified adaptive 
cut 

TermAnd Flag to indicate whether the other termination conditions are to be taken |N/A 
separately or conjunctively 


A2. API 


This information is forthcoming. 


A3. SUPPORTING DATA STRUCTURES 


AIR Cautions and Caveats 
=" Use of standard error termination conditions will likely cause inconsistencies between the 


blueprint content specifications and the information criteria will cause unpredictable 
results, likely leading to failures to meet blueprint requirements. 


=» The field-test positioning algorithm outlined here is very simple and will lead to 
deterministic placement of field-test items. 


16 American Institutes for Research 


AppendixJ— 2013 Pilot Test Report 


Page 28 of 39 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Balanced 


fee, Assessment Consortium 
From: ETS, Smarter Balanced Contract 05: Psychometric Services 
To: Smarter Balanced TD&V Leadership Team and Lead Psychometrician 
Subject: Pilot Test Data Analysis Results: Dimensionality Study and IRT Model Comparison 
Date: April 23, 2014 


Executive Summary 


This memorandum contains the statistical analysis summary of results pertaining to the Smarter Balanced 
Pilot Test, focusing on dimensionality and IRT model investigation. Details related to data processing, 
student samples, item analysis, DIF analysis, and other relevant factors were Summarized in a memorandum 
to Smarter Balanced in October 2013. 


The purposes and design of the Pilot Test administration were documented in the Pilot Test and Vertical 
Design (Educational Testing Service, 2013). The related data collection design is being repeated here to 
make the conclusions presented in this report more accessible. Caution should be exercised when 
interpreting information from the Pilot Test administration because of the following limitations: 


e The Pilot Test administration used a preliminary version of the Smarter Balanced test blueprints. 

e While Pilot tests were being delivered or scored, some items and item types were eliminated. 

e Although the initial design was intended to be a representative student sample, the student samples 
were largely convenience samples. 

e The performance task component underwent significant revision after the Pilot Test so that the 
classroom activity will be a required component of the test administration. Classroom activity was 
not required during the Pilot. 

e The number of scorable performance tasks was very small for some tests, and there were no 
surviving performance tasks for the mathematics tests. 

e Human scoring was performed on the basis of each item, and each item received a maximum of 
1,800 scored responses and as few as 500 scored responses for some item types. As a result, not 
all student responses were fully scored and the sparseness of the analysis data matrix was 
significant. 

e The content data review of the Pilot data for the items was not completed prior to the completion of 
the analysis activities. Based on preliminary data review, recommendations were implemented 
concerning which items to include or exclude from the item bank. Items were included if they were 
not rejected by data review and if they had an item-total correlation no less than 0.15. 


The major Pilot statistical analysis activities are item and DIF analyses for CAT items to support data review 
(completed in October 2013), a dimensionality study to explore grade-level and adjacent-grade dimensional 
structure, and IRT analyses to provide a basis for the selection of an IRT model. 


In the Pilot, students took either a CAT test or a combined CAT and performance task (PT) configuration. 
Students taking only CAT components took two (ELA) or three (Math) content representative item collections 
(called modules). Each Math module had 23 selected-response (SR) items and constructed-response (CR) 
items and was expected to require about 45 minutes to complete. An ELA module had about 29 items at 

1 


PILOT ANALYSIS SUMMARY OF RESULTS 


lower grade levels and 33 items at high school grades and each module was expected to take about 60-75 
minutes to complete. All single-selection Selected-Response (SR) items had four choices and multiple- 
selection Selected-Response (MSR) items had 5 to 8 choices. The performance task items had maximum 
scores ranging from O to 4. In accordance with the test design, other groups of students were administered a 
single CAT module and a performance task. A performance task was expected to have approximately five 
scoreable units yielding approximately 20 score points in total. Overall, 1,602 ELA CAT items, 49 ELA 
performance tasks (which included 318 items), and 1,883 Math CAT items were analyzed. No Math 
performance tasks were scored and used for any subsequent analyses. These items, in aggregate, represent 
the ELA and Math in all Claims. 


The majority of the Pilot Test contents (CAT modules and PTs) were administered to students at the grade for 
which the items/tasks were developed (i.e., the on-grade administration of items/tasks). Selected Pilot CAT 
modules and PTs were also administered to students at the upper- or lower-adjacent grade to facilitate 
vertical linking investigation (i.e., the off-grade administration of items/tasks). 


The response data for the items were collected from student samples ranging in size from 12,000 students 
in some high school grades to more than 40,000 in Grades 3 to 8. Though the samples were intended to be 
representative of their respective populations in characteristics such as their 2012 state test performance, 
gender, ethnicity, and special programs, the Pilot Test administration resulted in convenience student 
samples due to administration constraints. Because the representativeness of these samples is unknown, 
any comparisons based on results over grades and generalizations based on results of larger student 
populations should be regarded cautiously. Table 1 below summarizes the item pool sizes and student 
samples for all 18 tests. 


Table 1. Summary of Number of Items and Students 


ELA A Feluatclaarha (ers 
Number of (UT anlers) mei 
[UT aaleysyareyam ies) pals Sidente [UT aaleysvaneymmitclaats eens 
3 241 41,450 212 41,502 
4 236 49,797 214 43,722 
5 184 49,522 210 46,406 
6 227 49,670 213 42,051 
T 210 44,430 230 41,408 
8 232 41,132 224 44,650 
9 146 25,690 135 19,298 
10 157 16,079 139 12,438 
11 287 18,904 306 24,405 


After receipt of the scored student response data, statistical analyses of students’ responses were 
conducted to gain information about the quality of the test questions. The analyses include several 
components: item difficulty, item discrimination, item response distribution, and differential item functioning 
(DIF). In general, items appeared difficult for the students who participated in the Pilot Test administration. 
Most items had average item score values below 0.5. There was a relatively small number of items that 


showed some performance differences between student groups. In addition to the item level statistics, 
2 


PILOT ANALYSIS SUMMARY OF RESULTS 


Statistics for CAT item collections (modules) were computed, including the number of students taking the 
item collections, reliabilities, and observed score distributions as percentages of the maximum possible 
scores of the item collections (See Table 2 below). The median module score as a percentage of the 
module’s maximum score shows that the items, when appearing as a collection, were on average difficult for 
Pilot administration participants. In general, the on-grade administration of Pilot CAT modules received more 
student responses than the off-grade administration of those test contents. 


Table 2. Summary Statistics for CAT Item Collections (Modules) 


Supiics: eee Student Samples Reliability Percent of Maximum 
Min Max Max Median Min Max Median 
ELA 3 1,369 9,539 0.75 0.86 0.81 34.0 54.8 45.6 
4 1,092 7,426 0.70 0.83 0.77 34.8 54.4 44.8 
5 1,177 9,976 0.64 0.80 0.72 37.0 53.0 45.3 
6 1,278 4,915 0.60 0.80 0.72 37.3 48.3 43.0 
T 1,060 4,534 0.55 0.84 0.72 34.2 50.1 41.3 
8 491 4,331 0.53 0.79 0.69 35.1 46.4 42.4 
9 1,139 4,858 0.50 0.84 0.70 33.4 50.7 42.4 
10 507 2,838 0.64 0.81 0.72 31.4 47.1 36.8 
11 249 1,772 0.59 0.83 0.74 27.1 42.4 33.2 
Math 3 1,743 6,199 0.67 0.87 0.79 26.0 51.8 36.8 
4 1,917 4,763 0.67 0.87 0.81 15.8 48.4 36.0 
5 2,062 5,116 0.74 0.86 0.83 23.7 42.5 35.6 
6 1,801 4,498 0.65 0.88 0.79 22.1 45.0 32.5 
T 893 3,642 0.62 0.84 0.79 15.6 36.0 26.1 
8 1,416 5,166 0.59 0.84 0.75 11.6 34.4 25.0 
9 705 3,527 0.58 0.76 0.63 9.9 26.6 20.9 
10 631 2,106 0.54 0.79 0.69 14.7 33.2 20.9 
11 536 2,272 0.52 0.83 0.72 10.0 28.3 18.9 


Prior to conducting the dimensionality study and IRT analyses, the items were reviewed by content experts in 
light of these statistics. After the data review, more than 75% of ELA items and more than 83% of Math 
items were deemed appropriate for inclusion in dimensionality study and IRT analyses (except in Grade 9, 
where fewer than 70% of ELA items and fewer than 75% of Math items were included). Using the best 
available information from the Pilot, the evidence suggests that the unidimensionality model is consistently 
the preferred model. Therefore, the traditional IRT calibrations and linking can be performed. No changes 
are warranted to the scaling design, and all items for a grade and content area can be calibrated together 


simultaneously. Although a unidimensional model is consistently preferred, differences in dimensionality are 
most evident in Mathematics in the transition from Grade 8&8 to 9. This difference is somewhat expected since 
this delimits the transition into the course-specific content characterized by high school. 


To support the IRT model selection process, analysis results are presented for IRT calibration evaluation, fit 
comparison, guessing evaluation, common discrimination evaluation, and ability estimates evaluation. Prior 
to conducting these analyses and in addition to establishing some item exclusion rules, a noteworthy 


3 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


observation is that score categories for some items have to be pre-treated, because there are fewer than 10 
examinees in some score categories. Of all the items that have received category collapsing due to sparse 
responses (Tables B.1 and B.2), more than 70% of them have fewer than 1,500 valid responses from the 
Pilot Test administration. Because the Field Test, like the Pilot Test, will use newly developed items, it is 
advisable, in order to mitigate the cases of score category collapsing, that the item-level sample sizes be 
larger for the IRT models that Smarter Balanced will adopt for Field Test analyses. 


The model comparison analysis results with Pilot Test data suggest that the 2PL/GPC model combination 
should be adopted as the IRT model combination for calibrating Smarter Balanced items and establishing 
vertical scales. The 2PL/GPC model provides flexibility for estimating a range of item discriminations, without 
the complications of implementing a 3PL/GPC model. The major limitation of the 2PL/GPC model in this 
setting is that it has not been previously used for vertical scaling in K-12 assessments. This 
recommendation should be evaluated with caution given the experimental nature and limitations of the Pilot 
data, the possible change of item formats from Pilot to Field Test to operational administration, and the lack 
of information about vertical scaling results for the three models. 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


fee Assessment Consortium 


Table of Contents 


EXCCU VS: SUMIIMGUY siecsstoceabacteeeratetaciabialoseasrotecsabeukassnafeweuastatowanbaciorsessstedsahteteuecadstaseaniabodsassotessmbsukownatatoatmbecderecsagtode: 1 
Fe SOWING pce ares seeciae ueteacseciasese secieseanasunesasetueaatenmeaasecouetaacsewoccassecinnsas ae eese eet caerscoeee reas saniesenuanwe see -aceenwasseneseceennanresaueecceses 6 
dh TAC SIO We SUG secateaecsazestcceeceneceaicacihivecestcieascet odersesenanrodeavestuasesanaonedancieecmaecacivese reieiae ee ieiaaace at nderta seswereccavesse 13 
le UO Mee et iVOA DONO AC I) sarenceceousedenmercancindoanteecsnceeeseenentouiedensesdancicndonteaeve naeeio-tatoascinmnasheesdnesececesnacteniosecceverss 13 
1.2 PropoSed FactOr MOCEIS.........cccccsscccscscnseccceecccesencnsencesenseeeceeenansennsansesecsuseaeenansancaesaeenansanansaneassaeuseansenanrans 14 
Te WRT SCS, INTO CCI S vacssiecs si race cat cereecceaugecanncaad esecuseveaseagcastcsadescnssacacusdenieceeeswpaastersiasseensvanedonasseesqecsieocas 16 
1.4 Software and SysteM REQuireMentt ...........ccscsecssesenssssecccnsecuseseccsesecenescusnsecccasecusasencceueneneuccsarscussesecsessneness 16 
1.5 Evaluation of the Number and Types of Dimensions and MIRT Item StatiStiCs .......... cc eeeeeseeeeeeeeneeeeeees 17 
30: MIRE IGM: StallSHCS aid GLANS -wiveciccccvnacesnmnasdondocccchamuciedsawiulewsceuatmadtanaucaansbeneatnoaudiedaawiuleridasatmactanaupenehuausenense 26 
1. DISCUSSIONGANGCOMCIUSION sectaccasesexceentessceenuavvareusenGectessancsnetcoccatesuancaccensaenstouneaboseneecniecacecceseaueacceanecseteueatie 2/ 
2. Item Response Theory (IRT) Model COMPaAriSON ............ccceccceseeeeseeceeeeeeseeeeeeeeaueeeeeeeeeeeueesaeeeeaueeeaeesegeeeeaueenaeeees 28 
DAD eer TING AM GM cccetcesencseetececcetsesdiescesce-cstesnedee-cqaecseaecesteeadonessssu-eenaued-peaevcecectuaedosiaaveneessaescqaedseseqsesseecensseecnre: 30 
ZZ AT, IOC CIRC AIO GALI OID eeeeceseteaarteentenoseeseeastenntecoseseqssdetenaescanes<ahenotnesdcaseuceseetees sac ostucaheetteacdanvtasetenteccsacautxentend 34 
Zz WAV WIOGCIC Oil SONS sects ccna ge aanwoe cnasveccuasseescuasaciaucanaeideteancongnasstencuanscia tasnceacnes vase qnanteancesancoqaiacecnete: 35 
FIG COTO ANSON vasseec oceicecciecensccorsiccacecusnsee onawansaccsacecabeueaccornascennnsennsancooay aoneumanaeassnareapacueceocsumereoacseercansseeree pemeancess 35 

CS SSS EV AU OM cae caeetinscattgacetecieccitonct neues nuctnecetiacestesacabooa acon seat ac vase odadtuseutaceuiansaduvecitecuausohatinseis sods gasduieuien 3/ 
COMMON. DISCHIMINGTION EVGIUGTION seccrosa2siccsseecienscecsdcberazexeosdeosiswateeecvnacsstunianskcuatesseevelameddeatceauevbinenieatesexnees 38 
ADINLY ESUMALES. EV@IUGUON wusanccssctacsansntvernaetastectetiocuarsahavenssctassqbiatodsasctastectetiocuabechoueasretussqniatodsnclabaceantetanseasetee 44 

2A IIR WOGel RECOMINC WGN ON cots tetainsetececmectectentds saphecuctasos lesasmennensmessustasesdebaagescucksnundceactencesheoescebeneecusiemen duce 47 
FN SS acca aces encase era teem see se vases ees ares sae = ca semeence eae sone teaece ne Senrauiecaeee oie eearatasaecess dunce ces scedeeases 48 
PODS TICI Ae MOM VCOCOL OL S scaatssceecercaeacnccsnctadoateascnceansectetesseramedneesdeetse-ricaiasesd aarasureeaseuesenssvnednesetecesncee aeaeeetacenes 50 
Appendix B: Tables and Figures IRT Model COMpariSOn.............:cccccsscecsseeeesseeeeeaseeceseeceneeecageeeeageeeeasesseasesseaeess 15 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Background 


The Pilot Test administration was designed to collect data on items and tasks before they are used 
operationally and to make adjustments to test and item specifications based on the data collected. 
The Pilot Test was also intended to familiarize states, schools, teachers, and students with the kinds 
of items and tasks that will be a part of the Smarter Balanced summative assessments. All Pilot Test 
contents (CAT1 items and PT) were administered via computer to a sample of students from across 
the member states of the consortium. The student responses were scored by automated computer 
algorithm, artificial intelligence scoring, or human raters. Multiple raters scored the same student 
response when the items were designed to be human scored. 


Students participating in the Pilot Test took either a Mathematics test or an ELA test, including both 
the CAT and the PT components. The combination of a CAT component and a PT component covered 
the full content standards for the Pilot Test blueprints. 


The CAT component consists of single-selection Selected-Response (SR) items, multiple-selection 
Selected-Response (MSR) items, and Constructed-Response (CR) items that will eventually 
contribute to the operational Summative and Interim CAT item pools. During the Pilot, the CAT 
component items were administered as linear tests via computer. CAT items were arranged into 
collections called modules, which were the basis for all analyses. Each CAT module of Pilot items 
contained a small number of items that were necessary to cover the content standards, which 
ranged from 23 to 33 items. Each participating student was administered at least one on-grade CAT 
module. Items were targeted at a given grade level for on-grade calibration. Items were also given to 
adjacent, off-grade students for vertical scaling. 


A performance task (PT) is a collection of related items belonging to a common theme that consists 
of multiple items/observations and corresponding scores. Scores on the PT items ranged from O to 
4. Most students were administered individually based performance tasks. Some were administered 
classroom-based performance tasks that contain some provision for classroom collaboration. An 
individually based performance task required that students approach the task independently without 
preparatory activities. A classroom-based performance task entails classroom activities or student 
interactions concerning a shared task. Although small-group work may be involved in some part of a 
task, it will not be scored, and preparatory activities were standardized to the maximum extent 
possible. All Pilot performance tasks were developed with a detachable classroom activity which 
means a PT could be administered with or without the classroom activity portion. Each item 
configuration was treated as a unique item for purposes of analysis. There was not enough 
information and data to compare the properties of the classroom and individual versions of the same 
performance tasks. 


1Note that CAT refers to linear (fixed-form) administrations of the items in the item pool that will 
eventually be used for the computerized adaptive administration. 
6 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


There are three Claims in the case of Math and four in ELA Note that in the CAT and performance 
task sample the Claims appear in different proportions, with some Claims overlapping. Tables 3 and 
4 show the content assignments for ELA and Mathematics and the associated reporting categories. 


Table 3. ELA Content Structure 


Component Claims 
CAT Module Reading 


Score Reporting Category 
Literary 

Informational 
Purpose/Focus/Organization 
Evidence 

Conventions 
Listen/Interpret 


Writing 


Speaking/Listening 


Research Research 
PT Writing Purpose/Focus/Organization 
Evidence 
Conventions 
Research Research 


Table 4. Math Content Structure 


Component Claims Score Reporting Category 


CAT Module Concepts and Procedures 


Problem Solving 
Modeling and Data Analysis 
Communicating/Reasoning 


Domain 1 

Domain 2 

Problem Solving 

Modeling Data 
Communicating/Reasoning 


PT Problem Solving/Modeling and Data Analysis 


Communicating/Reasoning 


Problem Solving 
Modeling Data 
Communicating/Reasoning 


For the purpose of vertical scaling, the CAT component and PT component assigned to a student in a 
given grade could be on-grade or off-grade from either the adjacent-lower or upper grade. See Figure 
1 for a depiction of content assignment by grade. The off-grade content was determined by content 
experts to be grade-level appropriate and representative of the construct; this also minimized 
opportunity-to-learn concerns to the maximum extent possible. 


PILOT ANALYSIS SUMMARY OF RESULTS 


Examinees by Grade 
6) 7 


oD) 
co) 
© 
= 
1) 
>) 
O 
~~ 
= 
® 
~ 
SC 
eo) 
O 
Y 
Y) 
<b) 
_— 


Figure 1. Test Content Designation by Grade 


The various data collection designs were proposed for the Pilot testing in 2013 (see Pilot Test and 
Vertical Design, Educational Testing Service, 2013). In accordance with the testing times that were 
established for the CAT and PT components and because of the need to control the number of items 
each student will take, the following Pilot Test data collection designs were adopted for Pilot Test 
administration. 


e ELA tests will use the combination of alternate design variation 4 (Figure 2) and the 
supplemental design (Figure 4). Student samples selected for taking the ELA Pilot Tests will 
take either two CAT modules or one CAT module and one randomly-spiralled performance 
task. 

e Mathematics tests will use the combination of alternate design variation 3 (Figure 3) and the 
supplemental design (Figure 4). Student samples selected for taking the math tests will take 
either three CAT modules or one CAT module and one randomly-spiralled performance task. 


PILOT ANALYSIS SUMMARY OF RESULTS 


(apess-uo0) 
S9/NpoW LVD 


(apess-jjo) 
sainpow LV 


J@A97 JUapNys ye 
ld 


id apei3-sJamo| pue addn ‘uo suljesuds 


id apes3s-sJamo| pue Yaddn ‘uo 8uljesuds 


id apes3s-sJamo pue Yaddn ‘uo suljesuds 


id apes3-sJamo| pue Yaddn ‘uo suljesuds 


id apes3-sJamo| pue addn ‘uo suljesuds 


id apess-samoj pue Yaddn ‘uo 8uljesuds 


id apei3-sJamo| pue addn ‘uo suljesuds 


id apes3-sJamo| pue addn ‘uo suljesuds 


id apei3-sJamo| pue addn ‘uo suljesuds 


id apei3-sJamo| pue addn ‘uo suljesds 


id apes3-sJamo| pue ’addn ‘uo suljesuds 


id apei3-sJamo| pue Yaddn ‘uo suljesuds 


id apess-sJamo| pue addn ‘uo suljesds 


id apei3-sJamo| pue Yaddn ‘uo suljesuds 


id apei3-sJamoj pue ’addn ‘uo suljesuds 


Ld apes3-samoj pue “addn ‘uo 8uljesuds 


J9A97 WOOSse]D 7e 
ld 


Figure 2. Alternate Data Collection Design Variation 4 (Adopted for ELA) 


PILOT ANALYSIS SUMMARY OF RESULTS 


(apes3-u0) 
S9/NPOW LV) 


(apess-jj0) 
SaiNpoW LV) 


J@Aa7 UapNys ye 
Id 


id apei3-sJamo| pue addn 


id apei3-Jamo| pue addn 


id apes3-Jamo| pue addn 


id apess-Jamo| pue addn 


id apes3-Jamo| pue addn 


id apess-samoj pue “addn 


id apei3-sJamo| pue addn 


id apes3-Jamo| pue addn 


id apes3-sJamo| pue addn 


id apes3s-sJamo| pue addn 


id apes3-Jamo| pue addn 


id apess-sJamo| pue addn 


id apeis-Jamo| pue addn 


id apes3-sJamo| pue addn 


id apes3-sJamo| pue addn 


id apes3-Jamo| pue addn 


id apes3-sJamo| pue addn 


id apei3-Jamo| pue addn 


id apei3-sJamo| pue addn 


id apei3-sJamo| pue addn 


id apes3-Jamo| pue addn 


id apei3-Jamo| pue addn 


id apei3-Jamo| pue addn 


id apes3-Jamo| pue addn 


id apei3-Jamo| pue addn 


id apes3-sJamo| pue addn 


id apes3-Jamo| pue addn 


id apes3-sJamo| pue addn 


id apei3-Jamo| pue addn 


id apei3-Jamo| pue addn 


id apes3-Jamo| pue addn 


id apes3-Jamo| pue addn 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo suljesuds 


‘uo suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo Suljesuds 


‘uo suljesuds 


JAAD] WooOdsse]) 7e 


id 


Figure 3. Alternate Data Collection Design Variation 3 (Adopted for Mathematics) 


10 


PILOT ANALYSIS SUMMARY OF RESULTS 


Additional Performance Tasks 


” 
= 
=) 
mo} 
Oo 
= 
_ 
x 
O 
oY 
TG 
© 
= 
tole) 
~ 
Ge 
Oo 
© 
c 
Oo 
- 
xo) 
TG 
x 


(apes3-u0) 
Sa|NpoW Lv) 


(apess-jj0) 
sa|Npow LV) 


P13P13P13P13 


P11P11P11P11 


|anaq Juapnys 3e 
ld 


Ld apes3s-sJamoj pue ‘“’addn ‘uo suljesuds 


Id apes3-samMoy] pue ‘saddn ‘uo 8uressds 


Ld apes3-JaMoj pue “addn ‘uo suljesuds 
Id apes3-samo] pue ‘’addn ‘uo surjesids 
Ld apess-Jamoj pue “’addn ‘uo suljesuds 
Ld apes3s-JaMoj pue “addn ‘uo 8ulessds 
Id apes3-saMoj pue ‘’addn ‘uo suyesds 
Id apes3-iamMo] pue “’addn ‘uo 8uljesds 
1d apess-jamo] pue ‘Y’addn ‘uo sulesds 
id apes3-saMo] pue “’addn ‘uo suljesuds 
Ld apes3-JamMoj pue “addn ‘uo sulesuds 


Id apes3-iamMoy] pue ‘saddn ‘uo 8uresds 


ld apes3s-JaMoj pue “addn ‘uo 8uljessds 
Id apes3-samMoy] pue ‘’addn ‘uo 8ulessds 
id apes3-samo] pue ‘’addn ‘uo suyesds 
Id apes3-Jamo] pue ‘saddn ‘uo 8ulesds 


id apes3-samMoj pue ‘saddn ‘uo suyesuds 


9A] WOOJSSe]>D 3e 
ld 


Figure 4. Supplemental Pilot Design (Adopted for both ELA and Mathematics) 


11 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


The Pilot Tests were administered to students from February to May 2013. Student responses were 
scored in phases to facilitate analyses and item data review. The first phase of scored data files to 
Support CAT item analyses and data review became available in October 2013. Item and DIF 
analyses were completed in October to support item data review. The second phase of complete 
data files with PT scored responses became available in January 2014; these were used to conduct 
the IRT and dimensionality related analyses that are documented in the later sections of this 
document. 


Caution should be exercised when interpreting the Pilot analysis results due to the following 
constraints observed in the data in addition to those general limitations mentioned in the Executive 
Summary. 


e The students who were tested for the Pilot administration resulted in a convenience student 
Sample of the consortium. 

e Most items that were administered in the Pilot administration were deemed not suitable for 
operational administration. 

e Not all student responses were scored. The scoring had a maximum limit of 1,800 responses 
per item. 

e Notall responses by a student were scored which means a student could have answered 50 
items but only 30 were scored. This is the combined effect of scoring 1,800 responses per 
item limit and scoring by item instead of complete student records. 

e Some content that was designed to be administered off-grade was not administered. 

e Insome cases, there were no performance tasks in the vertical linking anchors because 
most performance tasks were determined to be not scorable. 


The following sections contain the procedures and results related to the dimensionality study and IRT 
model comparison. 


12 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


1. Dimensionality Study 


Before undertaking the Pilot calibration and scaling, Smarter Balanced sought insight concerning 
test dimensionality that will affect the IRT scaling design and ultimately the composite score that 
denotes overall student proficiency. This section describes the procedures used and outcomes 
pertaining to the dimensionality study based on the Pilot Test administration. 


Math and ELA are scaled using multidimensional IRT for Grades 3 to 11, both within grade and 
across (adjacent) grades. Due to the mixed format data for the Smarter Balanced assessments 
containing SR and CR items, both unidimensional and multidimensional versions of the 2PL (M-2PL) 
and 2PPC (Yen, 1993) (M-2PPC) IRT scaling models are used. Both unidimensional and 
multidimensional models are compared using a number of model fit measures. 


1.1 Rationale and Approach 


As a factor analytic approach, multidimensional IRT (MIRT) is used to examine the dimensional 
structure. Table 5 below shows that there are two components to the dimensionality to be evaluated. 
The first component pertains to assessing the degree to which essential unidimensionality is met 
within a single grade and content area. The second aspect concerns the degree of invariance in the 
construct across two adjacent grades. Both criteria can be met or violated. A multidimensional 
composite of scores can be identified, but it should be consistent across grades in order to best 
Support unidimensional scoring (Reckase, Ackerman, & Carlson, 1988). 


Table 5. Dimensionality Analysis in the Context of Vertical and Horizontal Scaling 


Construct Consistent Across Grades 
Violated Satisfied 
Uy alColfant-yalsireyar- isa uidaemcecleicsm Violated 0,0 O,1 
Satisfied 1,0 1,1 


The dimensionality of the Pilot Test data is studied using MIRT. This MIRT approach has a number of 
advantages. First, MIRT is very close to the more familiar unidimensional IRT scaling techniques. This 
approach can utilize familiar unidimensional models as a starting point for model comparison. The 
baseline model is the unidimensional case in which other candidate models can be compared. 
Second, from a practical perspective the sparse data matrix used for unidimensional scaling can be 
leveraged without the need to create other types of data structures (i.e., covariance matrices). In 
addition, further insight can be obtained with respect to the vertical scaling. Using exploratory 
approaches, the shift in the nature of the construct across levels can be inspected across adjacent 
grade levels. Factor analysis here is primarily confirmatory in nature. The primary focus is the Claim 
structure for ELA and Mathematics. Simple structure refers to loading on a specified factor in a 
confirmatory approach. Complex structure refers to freeing items to load on multiple factors using an 
exploratory approach. By using an exploratory approach, the dimensional structure can be evaluated 


13 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 


graphically using item vectors. Global fit comparison will be undertaken to arrive at a preferred 
model that will be used to determine the scaling approach and the resulting score reporting. Both 
the overall model test fit (e.g., Bayesian Information Criterion) and graphical depictions using item 
vectors can be utilized in evaluating the factor structure. 


The final reporting scale will be based on the Field Test administration. The Field Test and future 
operational administration will better reflect student performance while schools are in the process of 
transitioning instruction to the Common Core State Standards. The best case would be to replicate 
these findings in operational administrations. 


1.2 Proposed Factor Models 


The analysis consisted of two phases. The first phase examined each grade and content area 
separately (i.e., dimensionality within grade). The second phase investigated the dimensionality of 
two adjacent grade levels that contained unique grade specific items and common “vertical” linking 
items. The first step is a within-grade scaling. The results of the within-grade analysis will be 
evaluated before proceeding on to the across grades vertical linking. The next step is to concurrently 
scale two adjacent-grade tests and examine the resulting structure where a unidimensional multi- 
group model is implemented (Bock & Zimowski, 1997). The adjacent-grade levels have vertical 
linking items in common across grade groups. The choice in a candidate model can be assessed 
using the Bayesian or Akaike Inference Criterion (BIC/AIC) measures of global fit. The following factor 
models were proposed: 


1) Unidimensional Models: The baseline model for comparison is the unidimensional version. 
Since unidimensional models are more constrained versions of multidimensional ones, MIRT 
software can be used to estimate them as well. The unidimensional versions will be 
implemented with the same calibration software to afford a similar basis of comparison with 
other multidimensional models. 

2) Multidimensional Models 
e Exploratory Models (Complex Structure). The exploratory models “let the data speak” by 

adopting a complex structure in which items are permitted to load freely on multiple 
factors. Consistent with the approach outlined for unidimensional models, the first phase 
will examine each grade and content area separately (within-grade configuration). The 
next step is to concurrently scale two adjacent-grade test levels and examine the 
resulting structure. Using a two-dimensional exploratory model, item vectors can be 
evaluated graphically. An important aspect will be to note the direction of items and the 
composite vectors. If the same composite of factors is consistently present across grade 
levels, this will support the use of unidimensional approaches and the construction of the 
vertical scale. 

e Confirmatory Models (Simple Structure). Confirmatory models specify the loading of 
items on the factors, referred to as simple structure, according to specified criteria. Two 
types of confirmatory models will be investigated. 

A. Claim Structure. This model evaluates factors according to the Claim structure for 
each content area. For example, four Claims for Math are: Concepts & Procedures 


14 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 


(Domain 1 and Domain 2), Problem Solving, Modeling, and Communicating & 
Reasoning. A four factor model also results in ELA: Reading, Writing, 
Speaking/Listening, and Research. 

B. Bifactor Model. A bifactor model is proposed in which an overall factor is proposed 
along with two or more minor ones. The minor factors will correspond to the Claim 
structure at each grade. A depiction of the bifactor model is given in Figure 5 
consisting of a major factor and minor ones. 


CT=Yalsles] im arslerke) 


Figure 5. An Example of the Bifactor Model with Four Minor Factors Corresponding to Claims. 
In total, four different models were evaluated for a grade and content area both within and across 
grades. The model and analysis configuration is summarized in Table 6 for the within-grade analysis 


and the across-grade configurations. 


Table 6. Summary MIRT Analysis Configuration 


Configuration Content Areas Grades Total 


Unidimensional 


Within grade 2 9 18 
Across grades 2 8 16 
Multidimensional 
Exploratory Within grade 2 9 18 
Across grades 2 8 16 
Claim Structure Within grade 2 9 18 
Across grades 2 8 16 
Bifactor Within grade 2 9 18 
Across grades 2 8 16 


Total 170 
15 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


1.3 MIRT Scaling Models 


With mixed data present in the Pilot Test, different types of IRT scaling models must be chosen. For 
SR items, the two-parameter logistic (2PL) model will be used or the M-2PL (McKinley & Reckase, 
1983a) in the case of the multidimensional version. For CR items that includes all polytomous data, 
the two-parameter partial-credit model (2PPC) will be used. Likewise, for the dimensionality analysis 
the multidimensional two-parameter partial-credit model (M-2PPC) will be used (Yao & Schwarz, 
2006). The multidimensional models used are compensatory in nature since high values for one 
theta (factor) can balance or help compensate for low ones in computing the probability of a 
response to an item for a student. The MIRT models chosen for the dimensionality analysis 
correspond to unidimensional models used for horizontal and vertical scaling of the Pilot Test. The 
M-2PL model for selected response is given below. 


1 1 


Bo, O6,+ Bs, —Bo,06,-Bsj 


P =1 = 
1+e 


_ 
1+e 


where £,, =(,;,--;9) is a vector of dimension D corresponding to items discrimination 


parameters, Z, is a scale difficult parameter and ee ©6 =>" BO, . For a polytomously scored item 


j, the probability of a response k-1 for an examinee with ability 6 is given by the multidimensional 
version of the 2-PPC model (Yao & Schwarz, 2006) 


1A 06T Yo Pou 


yO | vv el) Boy 06 Sr ay) 


m=1 


where x, , ,_, isthe response of examinee /toitemj, Z,,, fork = 1,2...Ajare threshold parameters, 


O,...,K 


B;,,=0 and Kj is the number of response categories for the j*” item. 


1.4 Software and System Requirements 


A scaling approach is needed that can implement models associated with mixed item types and one 
that makes provisions for missing data “not presented” by design. This “not-presented or not- 
reached” option is necessary since any student only takes a very small subset of the total available 
items. To be practical, the factor analysis needed to use the same data structures used for the 
traditional unidimensional IRT modeling. A wide variety of scaling models are implemented by BMIRT 
necessary for scaling mixed item types. The program also produces model fit and multigroup (i.e., 
across-grade) analysis. The BMIRT program (Yao, 2003) implements a full Bayesian approach to 
parameter estimation that uses the Metropolis-Hastings algorithm. Using a batch file approach to 
implement the program makes BMIRT efficient to implement across many grades and content areas. 
The R package (Weeks, 2010) plink performs multidimensional linking and other types of functions 
such as plotting of item characteristic curves. Other supporting R programs have been developed 
that check the Markov Chain Monte Carlo (MCMC) stationary process. 


16 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


For parameter estimation, 1,000 MCMC iterations were used with 250 discarded for the MCMC 
burn-in. The resulting item parameters were then used as start values for another 1,000 MCMC 
cycles; 250 were discarded from these iterations as well. These second sets of iterations were used 
to compute the final parameter estimates. Note that 0.4 was used for the covariance for the prior 
ability functions (abilityPriorCov). Values of 0.0 corresponding to no relationship between factors and 
0.8 indicating high correlations between factors were also evaluated. The difference in fit using 
these two other values was very small compared with the covariance of 0.4. BMIRT program defaults 
were used for other priors or proposal functions. 


1.5 Evaluation of the Number and Types of Dimensions and MIRT Item Statistics 


A primary method for evaluating models is to use overall test fit indices. The Bayesian and Akaike 
Information Criterion (Akaike, 1973; Schwarz, 1978) provided by BMIRT is given below. 


BIC, = GZ + 2log(N) df, 
AIC, = Gi + 2df; 


where G{ is the likelihood and 2 log(N) df, and 2df, are penalties imposed for adding extra 
parameters to the model. These fit statistics can be used to compare either nested or non-nested 
models. Lower values of AIC and BIC indicate a better fitting candidate model. A referred factor 
structure results when it demonstrates the minimum fit value among several competing models. This 
permits comparison of model fit between unidimensional and multidimensional versions. Despite 
considerable advances in the estimation of a variety of complex models, no clear criteria exist for 
model acceptance. Several criteria will be evaluated to determine if the expected inferences are 
supported. This process of model choice is somewhat judgmental. To warrant the expense and 
operational complications involved in implementing a multidimensional scaling model, the 
preponderance of information would need to demonstrate the data are strongly multidimensional 
and that this multidimensionality varies over grades. 


In Tables 7 and 8, AIC, BIC, the likelihood, and degrees of freedom (df) are presented for ELA and 
Mathematics. These tables show the overall fit by grade configuration (within-grade). They show the 
fit measures for the unidimensional, exploratory, Claims scores and bifactor models. The second set 
of global fit measures in Tables 9 and 10 show the across (adjacent) grade analysis where data from 
two adjacent grades are used. The measures for overall fit (across adjacent grades) are given and for 
each grade separately. Based on AIC and BIC, the unidimensional model is consistently the preferred 
model. Somewhat surprisingly, the bifactor model did not improve on the fit given by the Claims 
model. 


For example, using Grade 3 ELA, the value of AIC for the unidimensional model was 1,580,927, 
which is lower than the values for the exploratory, Claims scores, and bifactor models. The values for 
BIC are larger by definition and follow the same pattern as AIC with the unidimensional model as the 
preferred candidate model. For the across-grade fit that contained vertical linking items, the 


17 


Assessment Consortium 


PILOT ANALYSIS SUMMARY OF RESULTS 


unidimensional model was also substantiated. The comparative fit across-grade models followed the 
same pattern as the within-grade analysis. 


Table 7. Fit Measures for ELA within Grade 


CT gsle(s) 
3 


10 


11 


Model 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 
Unidimensional 
Exploratory 
Claim Scores 
Bifactor 


AIC 
1,580,927 
1,637,492 
1,736,151 
1,847,184 
1,671,889 
1,743,604 
1,874,179 
2,003,/55 
1,269,024 
1,338,209 
1,471,467 
1,600,465 
1,422,993 
1,500,371 
1,639,063 
1,/63,/84 
1,310,456 
1,372,121 
1,488,947 
1,605,914 
1,282,613 
1,344,545 
1,457,239 
1,561,028 

123,096 
160,617 
835,337 
898,965 
486,630 
511,248 
552,276 
597,408 
124,846 
145,309 
195,682 
837,513 


18 


BIC 
1,941,833 
2,355,936 
3,169,699 
3,638,297 
2,113,952 
2,624,177 
3,631,798 
4,199,911 
1,/07,627 
2,212,463 
3,217,049 
3,781,726 
1,863,403 
2,378,010 
3,391,185 
3,953,161 
1,699,799 
2,147,449 
3,036,270 
3,539,248 
1,640,099 
2,056,117 
2,8/7,012 
3,334,914 

934,581 
1,181,126 
1,673,916 
1,946,592 

612,020 

159,593 
1,046,435 
1,214,505 

877,585 
1,045,794 
1,391,686 
1,581,289 


Likelinood 
-(48,655 
-(35,518 
-(02,006 
-(16,101 
-(85,199 
-(71,915 
-(37,/15 
-(52,/58 
-584,/28 
-569,8/72 
-537,599 
-552,647 
-661,524 
-650,603 
-620,/24 
-633,470 
-610,484 
-596,958 
-566,652 
-580,/75 
-599,857 
-589,/66 
-563,999 
-574,834 
-335,611 
-328,/37 
-314,823 
-320,999 
-226,999 
-223,314 
-211,837 
-218,406 
-342,958 
-334,360 
-321,886 
-323,969 


df 
41,809 
83,228 
166,069 
207,491 
50,145 
99,887 
199,374 
249,119 
49,784 
99,233 
198,134 
247,586 
49,972 
99,583 
198,808 
248,422 
44,744 
89,102 
477,824 
222,182 
41,450 
82,506 
164,621 
205,680 
25,937 
51,572 
102,845 
128,483 
16,316 
32,310 
64,301 
80,298 
19,465 
38,294 
75,955 
94,787 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Table 8. Fit Measures for Mathematics within Grade 


Ci gzle[= Model AIC BIC MLC) latexere oli 
3 Unidimensional 1,243,/07 1,595,231 -581,019 40,835 
Exploratory 1,293,666 1,993,571 -565,528 81,305 
Claim Scores 1,415,106 2,811,801 -545,305 162,248 
Bifactor 1,521,203 3,266,305 -557,881 202,721 
4 Unidimensional 1,361,780 1,/44,943 -636,/75 44,115 
Exploratory 1,420,052 2,182,896 -622,197 87,829 
Claim Scores 1,560,890 3,083,122 -605,185 175,260 
Bifactor 1,671,350 3,573,288 -616,698 218,977 
5 Unidimensional 1,614,121 2,023,211 -(60,281 46,780 
Exploratory 1,664,992 2,479,752 -(39,327 93,169 
Claim Scores 1,818,934 3,445,061 -(23,517 185,950 
Bifactor 1,919,462 3,951,285 -(27,389 232,342 
6 Unidimensional 1,245,624 1,612,386 -580,395 42,417 
Exploratory 1,301,437 2,031,746 -566,25/7 84,462 
Claim Scores 1,444,817 2,902,243 -553,853 168,555 
Bifactor 1,540,013 3,361,011 -559,403 210,603 
T Unidimensional 1,123,242 1,476,898 -520,561 41,060 
Exploratory 1,186,090 1,889,973 -511,323 81,/22 
Claim Scores 1,318,147 2,(22,512 -496,025 163,049 
Bifactor 1,419,308 3,173,927 -505,940 203,/14 
8 Unidimensional 1,182,794 1,574,880 -+546,363 45,034 
Exploratory 1,243,004 2,023,755 -531,827 89,675 
Claim Scores 1,398,606 2,956,/13 -520,343 178,960 
Bifactor 1,496,807 3,443,605 -524,800 223,604 
9 Unidimensional 516,180 670,072 -238,530 19,560 
Exploratory 536,809 842,454 -229,557 38,848 
Claim Scores 612,138 1,221,311 -228,642 17,427 
Bifactor 648,848 1,409,797 -227,(06 96,/18 
10 Unidimensional 367,643 462,355 -171,071 12,750 
Exploratory 382,/95 569,806 -166,223 25,175 
Claim Scores 425,940 197,570 -162,942 50,028 
Bifactor 454,729 918,679 -164,909 62,456 
11 Unidimensional 505,284 103,857 -228,087 24,555 
Exploratory 543,836 936,293 -223,388 48,530 
Claim Scores 630,439 1,410,687 -218,/36 96,483 
Bifactor 683,/48 1,657,903 -221,413 120,461 


19 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Table 9. Fit Measures for ELA across Adjacent Grades 


Grades Meyer) Group Ne le Bpeniaterers df 
3to4  Unidimensional Overall 3,255,135 4,123,262 -1,535,423 92,145 
Unidimensional 3 1,582,366 1,944,195 -(49,267 41,916 
Unidimensional 4 1,672,/70 2,115,573 -(86,156 50,229 
Exploratory Overall 3,381,393 5,108,951 -1,507,330 183,367 
Exploratory 3 1,637,806 2,357,468 -(35,534 83,369 
Exploratory 4 1,743,587 2,625,139 -(/1,796 99,998 
Claim Scores Overall 3,703,214 7,149,559 #£-1,485,804 365,803 
Claim Scores 3 1,734,620 3,169,972 -(01,032 166,278 
Claim Scores 4 1,968,594 3,727,544 -(84,772 199,525 
Bifactor Overall 4,057,828 8,363,605 -1,571,889 457,025 
Bifactor 3 1,850,234 3,643,444 -(17,383 207,/34 
Bifactor 4 2,207,595 4,405,267 -854,506 249,291 
4to5  Unidimensional Overall 2,942,383 3,894,243 -1,371,059 100,132 
Unidimensional 4 1,672,823 2,115,732 -(86,170 50,241 
Unidimensional 5H 1,269,560 1,709,105 -584,889 49,891 
Exploratory Overall 3,084,751 4,980,134 -1,342,989 199,387 
Exploratory 4 1,742,772 2,624,456 -(71,373 100,013 
Exploratory 5H 1,341,979 2,217,475 -5/71,616 99,374 
Claim Scores Overall 3,446,338 7,228,691 -1,325,280 397,889 
Claim Scores 4 1,870,656 3,629,915 -(35,/68 199,560 
Claim Scores 5H 1,575,682 3,322,982 -589,512 198,329 
Bifactor Overall 3,837,936 8,563,813 -1,421,824 497,144 
Bifactor 4 2,004,632 4,202,692 -(52,981 249,335 
Bifactor 5H 1,833,305 4,016,530 -668,843 247,809 
5to6 Unidimensional Overall 2,693,333 3,643,487 -1,246,/701 99,966 
Unidimensional 5H 1,269,703 1,709,283 -584,956 49,895 
Unidimensional 6 1,423,631 1,864,913 -661,/744 50,071 
Exploratory Overall 2,842,088 4,734,451 -1,221,948 199,096 
Exploratory 5H 1,342,161 2,217,736 -5/71,698 99,383 
Exploratory 6 1,499,927 2,378,712 -650,251 99,/13 
Claim Scores Overall 3,202,642 6,979,344 -1,203,973 397,348 
Claim Scores 5H 1,468,207 3,215,798 -535,/41 198,362 
Claim Scores 6 1,734,435 3,488,126 -668,231 198,986 
Bifactor Overall 3,594,141 8,313,051 #£-1,300,592 496,478 
Bifactor 5H 1,603,426 3,787,039 -553,860 247,853 
Bifactor 6 1,990,715 4,181,881 -(46,/32 248,625 


20 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Table 9. Fit Measures for ELA across Adjacent Grades, continued 


Grades 


\V Keres) 


CT celble) 


AIC 


BIC 


df 


6 to 7 Unidimensional 
Unidimensional 
Unidimensional 


(to8 


8 to 9 


Exploratory 
Exploratory 
Exploratory 
Claim Scores 
Claim Scores 
Claim Scores 
Bifactor 
Bifactor 
Bifactor 


Unidimensional 
Unidimensional 
Unidimensional 


Exploratory 
Exploratory 
Exploratory 
Claim Scores 
Claim Scores 
Claim Scores 
Bifactor 
Bifactor 
Bifactor 


Unidimensional 
Unidimensional 
Unidimensional 


Exploratory 
Exploratory 
Exploratory 
Claim Scores 
Claim Scores 
Claim Scores 
Bifactor 
Bifactor 
Bifactor 


Overall 
6 
7 
Overall 
6 
7 
Overall 
6 
7 
Overall 
6 
7 
Overall 
T 
8 
Overall 
T 
8 
Overall 
T 
8 
Overall 
T 
8 
Overall 
8 
9 
Overall 
8 
9 
Overall 
8 
9 
Overall 
8 
9 


2,/34,953 
1,423,768 
1,311,185 
2,869,962 
1,498,621 
1,371,341 
3,228, /96 
1,635,272 
1,593,524 
3,580,506 
1,/66,238 
1,814,268 
2,095,184 
1,311,172 
1,284,012 
2,/12,594 
1,368,710 
1,343,883 
3,037,992 
1,488,031 
1,549,961 
3,35! ,22/ 
1,608,923 
1,748,304 
2,007,224 
1,283,475 

123,748 
2,106,595 
1,346,541 

160,054 
2,355,982 
1,454,408 

901,573 
2,092,106 
1,564,631 
1,027,475 


21 


3,632,171 


1,865,024 
1,701,494 
4,656,033 
2,377,370 
2,147,975 
6,/92,497 
3,389,033 
3,142,710 
8,033,060 
3,957,518 
3,149,752 
3,403,519 
1,701,351 
1,642,351 
4,320,731 
2,145,152 
2,056,577 
6,245,658 
3,037,025 
2,971,268 
1,364,695 
3,544,206 
3,523,941 
2,623,188 
1,642,125 

936,000 
3,330,799 
2,059,674 
1,181,582 
4,796,591 
2,8/6,536 
1,741,563 
5,640,955 
3,341,268 
2,0/76,/17 


Micliaterers | 
1,272,554 
661,816 
610,737 
-1,246,020 
649,602 
596,419 
1,237,369 
618,642 
618,727 
1,319,186 
634,481 
684,705 
1,211,203 
610,746 
600,457 
1,184,431 
595,125 
589,306 
1,176,184 
566,002 
610,181 
1,250,324 
582,055 
668,269 
935,996 
600,153 
335,843 
918,914 
590,583 
328,330 
-910,079 
562,310 
347,769 
961,373 
576,317 
385,057 


94,923 
50,068 
44,855 

188,961 
99,709 
89,252 

377,029 

198,994 

178,035 

471,067 

248,638 

222,429 
86,389 
44,840 
41,549 

171,866 
89,230 
82,636 

342,812 

178,013 

164,799 

428,289 

222,406 

205,883 
67,616 
41,585 
26,031 

134,384 
82,687 
51,697 

267,912 

164,894 

103,018 

334,680 

205,999 

128,681 


Smarter 
Balanced 


Assessment Consortium 


PILOT ANALYSIS SUMMARY OF RESULTS 


Table 9. Fit Measures for ELA across Adjacent Grades, continued 


Ci gs\elsts 


Vere (=) 


Group 


AIC 


BIC 


Likelihood 


df 


9to 10 Unidimensional 
Unidimensional 
Unidimensional 


10 to 11 


Exploratory 
Exploratory 
Exploratory 
Claim Scores 
Claim Scores 
Claim Scores 
Bifactor 
Bifactor 
Bifactor 


Unidimensional 
Unidimensional 
Unidimensional 


Exploratory 
Exploratory 
Exploratory 
Claim Scores 
Claim Scores 
Claim Scores 
Bifactor 
Bifactor 
Bifactor 


Overall 
9 

10 
Overall 
9 

10 
Overall 
9 

10 
Overall 
9 

10 
Overall 
9 

10 
Overall 
9 

10 
Overall 
9 

10 
Overall 
9 

10 


1,211,766 
123,694 
488,071 

1,274,759 
161,797 
512,962 

1,417,427 
833,651 
583,/76 

1,561,259 
899,379 
661,880 

1,213,870 
487,682 
126,188 

1,261,019 
SL3;,913 
147,047 

1,375,980 
552,391 
823,589 

1,485,638 
598,001 
887,637 


22 


1,578,699 
935,849 
614,499 

2,001,981 

1,183,186 
162,658 

2,865,158 

1,673,535 

1,079,925 

3,369,279 

1,948,523 

1,281,275 

1,518,346 
613,971 
879,570 

1,860,730 
163,477 

1,048,380 

2,066,093 

1,048,348 

1,420,740 

2,970,985 

1,217,196 

1,632,715 


-563,413 
-335,828 
-227,585 
-553,209 
-329,218 
-223,990 
-541,149 
-313,821 
-22 7,328 
-571,364 
-321,021 
-250,343 
-570,955 
-227,408 
-343,547 
-559,642 
-224,520 
-335,121 
-547,354 
-211,660 
-335,694 
-567,295 
-218,430 
-348,865 


42,470 
26,019 
16,451 
84,171 
51,680 
32,491 
167,565 
103,005 
64,560 
209,266 
128,669 
80,597 
35,980 
16,433 
19,547 
10,868 
32,466 
38,402 
140,636 
64,535 
76,101 
175,524 
80,571 
94,953 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Table 10. Fit Measures for Mathematics across Adjacent Grades 


Grades Meyer) Group Ne le Beniaterers df 
3to4  Unidimensional Overall 2,609,055 3,402,805 -1,219,552 84,976 
Unidimensional 3 1,245,590 1,597,234 -581,946 40,849 
Unidimensional 4 1,363,465 1,746,733 -63 7,606 44,127 
Exploratory Overall 2,724,905 4,305,109 -1,193,282 169,171 
Exploratory 3 1,299,575 1,999,652 -568,463 81,325 
Exploratory 4 1,425,330 2,188,322 -624,819 87,846 
Claim Scores Overall 3,024,199 6,177,237 -1,174,546 337,553 
Claim Scores 3 1,417,002 2,813,971 -546,221 162,280 
Claim Scores 4 1,607,197 3,129,542 -628,326 175,273 
Bifactor Overall 3,226,816 7,166,308 -1,191,660 421,748 
Bifactor 3 1,521,641 3,267,069 -558,061 202,759 
Bifactor 4 1,705,175 3,607,218 -633,599 218,989 
4to5  Unidimensional Overall 2,981,009 3,836,472 -1,399,584 90,921 
Unidimensional 4 1,364,880 1,748,122 -638,316 44,124 
Unidimensional 5H 1,616,129 2,025,368 -(61,268 46,/97 
Exploratory Overall 3,086,050 4,789,382 -1,361,990 181,035 
Exploratory 4 1,427,225 2,190,182 -625,770 8/,842 
Exploratory 5H 1,658,825 2,473,795 -/36,220 93,193 
Claim Scores Overall 3,436,284 6,835,279 -1,356,887 361,255 
Claim Scores 4 1,564,470 3,086,883 -606,954 175,281 
Claim Scores 5 1,871,814 3,498,151 -(49,933 185,974 
Bifactor Overall 3,637,368 7,884,233 -1,367,315 451,369 
Bifactor 4 1,673,654 3,575,809 -617,825 219,002 
Bifactor 5 1,963,715 3,995,757 -(49,490 232,367 
5to6 Unidimensional Overall 2,867,813 3,705,554 -1,344,691 89,215 
Unidimensional 5H 1,617,910 2,027,052 -(62,169 46,/86 
Unidimensional 6 1,249,902 1,616,768 -582,522 42,429 
Exploratory Overall 2,975,243 4,643,466 £-1,309,964 177,657 
Exploratory 5 1,669,399 2,484,237 -(41,521 93,178 
Exploratory 6 1,305,844 2,036,300 -568,443 84,479 
Claim Scores Overall 3,309,818 6,638,931 -1,300,376 354,533 
Claim Scores 5H 1,823,344 3,449,602 -(25,/07 185,965 
Claim Scores 6 1,486,474 2,944,012 -574,669 168,568 
Bifactor Overall 3,497,384 7,656,979 -1,305,7/17 442,975 
Bifactor 5 1,920,776 3,952,756 -(28,028 232,360 
Bifactor 6 1,576,608 3,397,710 -577,689 210,615 


23 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Table 10. Fit Measures for Mathematics across Adjacent Grades, continued 


CT gle (ss) 


Vere t=) 


CTcelble) 


AIC 


BIC 


Likelihood 


df 


6 to / Unidimensional Overall 2,373,141 3,151,522 -1,103,081 83,489 
Unidimensional 6 1,247,380 1,614,177 -581,269 42,421 
Unidimensional { 1,125,761 1,479,486 -521,812 41,068 
Exploratory Overall 2,494,563 4,044,090 -1,081,079 166,202 
Exploratory 6 1,305,116 2,035,476 -568,090 84,468 
Exploratory { 1,189,447 1,893,434 -512,990 81,/34 
Claim Scores Overall 2,803,345 5,895,090 -1,070,052 331,620 
Claim Scores 6 1,448,121 2,905,634 -555,496 168,565 
Claim Scores { 1,355,223 2,759,640 -514,557 163,055 
Bifactor Overall 2,985,167 6,848,058 -1,078,251 414,333 
Bifactor 6 1,546,176 3,367,277 -562,473 210,615 
Bifactor { 1,438,991 3,193,645 -515,778 203,/18 

(to8 Unidimensional Overall 2,310,404 3,115,824 -1,069,098 86,104 
Unidimensional { 1,125,531 1,479,238 -521,699 41,066 
Unidimensional 8 1,184,873 1,576,994 -547,399 45,038 
Exploratory Overall 2,432,489 4,035,883 -1,044,833 171,412 
Exploratory { 1,189,292 1,893,253 -512,915 81,/31 
Exploratory 8 1,243,198 2,024,001 -531,918 89,681 
Claim Scores Overall 2,758,373 5,957,640 -1,037,167 342,020 
Claim Scores { 1,322,333 2,726,827 -498,103 163,064 
Claim Scores 8 1,436,040 2,994,112 -539,064 178,956 
Bifactor Overall 2,946,770 6,944,012 -1,046,057 427,328 
Bifactor { 1,424,973 3,179,747 -508, 7/54 203,732 
Bifactor 8 1,521,798 3,468,526 -537,303 223,596 

8 to 9 Unidimensional Overall 1,702,770 2,288,505 -(86,/75 64,610 
Unidimensional 8 1,184,158 1,576,296 -547,039 45,040 
Unidimensional 9 518,613 672,584 -239,/36 19,570 
Exploratory Overall 1,785,655 2,951,024 -(64,280 128,547 
Exploratory 8 1,245,487 2,026,316 -533,059 89,684 
Exploratory 9 540,168 845,931 -231,221 38,863 
Claim Scores Overall 2,027,808 4,352,371 -(57,491 256,413 
Claim Scores 8 1,401,321 2,959,559 -521,686 178,975 
Claim Scores 9 626,486 1,235,746 -235,805 11,438 
Bifactor Overall 2,160,865 5,065,062 -(/60,082 320,350 
Bifactor 8 1,504,595 3,451,549 -528,675 223,622 
Bifactor 9 656,270 1,417,297 -231,407 96,/28 


24 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 
Assessment Consortium 


Table 10. Fit Measures for Mathematics across Adjacent Grades, continued 


\V Keres) 


Grades 


9to 10 Unidimensional 
Unidimensional 
Unidimensional 


10 to 11 


Exploratory 
Exploratory 
Exploratory 
Claim Scores 
Claim Scores 
Claim Scores 
Bifactor 
Bifactor 
Bifactor 


Unidimensional 
Unidimensional 
Unidimensional 


Exploratory 
Exploratory 
Exploratory 
Claim Scores 
Claim Scores 
Claim Scores 
Bifactor 
Bifactor 
Bifactor 


25 


Group AIC BIC 

Overall 886,249 1,156,660 
9 516,989 670,991 

10 369,260 463,987 
Overall 922,509 1,458,271 
9 537,339 843,148 

10 385,170 572,203 
Overall 1,052,414 2,118,811 
9 614,092 1,223,540 

10 438,322 809,885 
Overall 1,110,102 2,441,850 
9 649,857 1,411,136 

10 460,246 924,092 
Overall 876,674 1,194,819 
10 369,223 464,151 

11 507,452 706,648 
Overall 933,765 1,561,840 
10 388,458 575,789 

11 545,306 938,637 
Overall 1,072,896 2,320,763 
10 428,775 800,932 

11 644,121 1,425,630 
Overall 1,141,252 2,699,050 
10 454,125 918,706 

11 687,127 1,662,746 


Likelihood oli 
-410,798 32,326 
-238,920 19,574 
-171,878 12,752 
-397,207 64,047 
-229,800 38,869 
-167,407 25,178 
-398,726 127,481 
-229,584 17,462 
-169,142 50,019 
-395,849 159,202 
-228,168 96,760 
-167,681 62,442 
-400,926 37,411 
-171,832 12,779 
-229,094 24,632 
-393,026 73,856 
-169,011 25,218 
-224,015 48,638 
-389,710 146,738 
-164,288 50,099 
-225,421 96,639 
-387,443 183,183 
-164,521 62,541 
-222,922 120,642 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


1.6 MIRT Item Statistics and Graphs 


Three primary MIRT item characteristics were computed that correspond to direction, difficulty, and 
discrimination presented graphically. The magnitude given by the length of the vector corresponds to 
its discriminating power 


>I—> 


The angle measure of the vector with each axis is 


aij 


ai = arCCOS 


>I 


ada 


where aj; is the j-th element of the vector of item discriminations for item /. In order to obtain 
degrees, the angle measure is multiplied by 180/n. The quadrant of the plot in which an item resides 
roughly corresponds to its difficulty. The multidimensional difficulty is 


where 2 is the location or scalar item parameter related to item difficulty. 


A composite directional vector can be computed using the matrix of discriminations a and then 
computing the eigenvalues for a’a. Each diagonal value in the matrix is the sum of the squared a- 
elements for each ability dimension of the matrix. The off diagonal values are the sums of the cross 
products of the a-elements from different dimensions. The eigenvector that corresponds to the 
largest eigenvalue is eigenvector one. The sum of the squared elements of the eigenvector is equal 
to one, and these elements have the properties of direction cosines. The direction cosines give the 
orientation of the reference composite with respect to the coordinate axes of the ability space. The 
angle between the reference composite and the coordinate axes can be determined by taking the 
arccosine of the elements of the eigenvector. 


The Reckase, Martineau, & Kim (2000) item vector approach will be used to evaluate the 
characteristics of exploratory models using complex structure. The graphs showing the item vectors 
used the exploratory model with two dimensions. The development of these measures is conducted 
in a polar coordinate system so that direction can be specified as an angle from a particular axis. 
Using the MIRT item discrimination, the directions of maximum discrimination and MIRT item 
difficulty can all be depicted in the same graph. The origin of the item vectors is the MIRT difficulty. 
The reference composite vector composed of all items is also shown as a large red arrow. Item 
vectors that point in the same essential direction measure essentially the same dimension. Note that 
by definition, graphs of simple structure are not useful since all items are assigned to a defined axis 


corresponding to a factor. 
26 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


The item vector plots are presented in Appendix A using the two-dimensional exploratory model. Plots 
are presented for ELA and Mathematics within grade, across two adjacent grades and for the subset 
of common, vertical linking items. The graphs of directional measures are presented in Figures? A.1 
to A.9 for ELA. Figures A.11 to A.17 show item vectors for ELA across (adjacent) grades, and Figures 
A.18 to A.25 show them for the subset of vertical linking items. The graphs of directional measures 
are presented in Figures A.26 to A.34 for Mathematics. Figures A.35 to A.42 show item vectors for 
Mathematics across adjacent grades, and Figures A.43 to A.50 show them for the subset of linking 
items. The exploratory model is presented for diagnostic purposes to lend further insight into item 
functioning across dimensions. The plots for the exploratory model suggest that most items are 
primarily influenced by a composite of both factors. The item vector plot for Mathematics for the 
vertical linking items for Grades 8 and 9 shows the composite vector more closely associated with 
the first factor (6, ) . This difference is reasonable since this delineates the transition to high school 
course specific content. In addition, for the vertical linking set for ELA Grades 9 and 10, some highly 
discriminating items are associated with the first factor. 


1.7 Discussion and Conclusion 


The evidence suggests that the unidimensional model was consistently the preferred model. This is 
consistent with the use of traditional IRT models for calibration and linking. No changes are 
warranted to the scaling design, and all items for a grade and content area can be calibrated 
together simultaneously. Although a unidimensional model was preferred, differences in 
dimensionality were most evident in Mathematics in the transition from Grade 8 to Grade 9. This 
difference is expected since this delimits the transition into the course specific content characterized 
by high school. 


The approach adopted here is to use the best available information from the Pilot to inform decision- 
making regarding future development phases. At the minimum, the test dimensionality study based 
on the Pilot Test can only be viewed as preliminary and may need to be readdressed in the future. 
This is partly reflected in the changes that occurred in the item types, content configurations, and 
test design used in the Pilot compared with those employed for the Field Test. An overall concern is 
the degree of implementation of the Common Core State Standards across the Consortium. This will 
affect the results of this dimensionality study in ways that cannot currently be anticipated. 


2The item vector plots represent separate calibrations. 
27 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


2. Item Response Theory (IRT) Model Comparison 


Within the family of IRT models there are two major choices to be made: 1) use of a unidimensional 
or multidimensional model, and 2) within the category of unidimensional models, the use of a Rasch 
one-parameter/partial credit model (Rasch/PC) combination, a two-parameter logistic/generalized 
partial credit model (2PL/GPC) combination, or a three-parameter logistic/generalized partial credit 
(3PL/GPC) combination. 


It is highly desirable that a unidimensional model be used since the properties of these models are 
well known for scaling and are ones that have been used extensively in K-12 education to make 
critical decisions concerning students, teachers, and schools. Also, the IRT models selected must be 
implemented in operational Computerized Adaptive Testing (CAT). A multidimensional CAT with many 
constraints would be highly difficult to implement. Selection of an IRT model to comply with CAT 
implementation constraints may override other considerations. 


The dimensionality study results from the previous section suggest that a unidimensional IRT model 
with a single IRT scale within each grade level could be used. Three unidimensional IRT model 
combinations are applied to the Pilot data for the dichotomous and polytomous item calibration. 
Specifically, these combinations are the Rasch one-parameter/partial credit model (1PL/PC) 
combination, the two-parameter logistic/generalized partial credit model (2PL/GPC) combination, 
and the three-parameter logistic/generalized partial credit model (SPL/GPC) combination. 
Calibration and scaling results based on all the three IRT model combinations are presented and 
compared, and they are used for making recommendations for IRT model choice for the Field Test 
and operational use and for determining the set of item parameter estimates to be stored in the item 
bank. 


The Smarter Balanced assessment includes SR items, CR items, and performance task (PT) items 
that include both SR and CR items. For SR items, a 3PL, 2PL, or LPL or Rasch model is used. The 
3PL model is given by 


P(0@;) =c,+(l-e¢,)/(Q+t exp(—Da,(6, — b,)))» 


where P(@) is the probability of a correct response to item / by an examinee with ability 0. ,and 4, 


b, and ¢€ are the discrimination, difficulty, and lower asymptote parameters, respectively, for item /, 


and D is aconstant that puts the @ ability scale in the same metric as the normal ogive model (D = 
1.7). The 3PL model can be constrained to equal the Rasch model by setting the constant a 
parameter to 1/D and the c parameter to O. If the a parameter is left free to vary by item and c = O, 
then the 2PL model results. 


For CR items, the generalized partial credit model (Muraki, 1992) or partial credit model (Masters, 
1982) is employed. The generalized partial credit model is given by 


28 


PILOT ANALYSIS SUMMARY OF RESULTS 


h 
exp(>, Da, (0. =. b, + d., )) 
is (0.) = n, = ——— 


> exp( Da,(@, =0. + d,, )) 
c=l v=l 


where P(e) is the probability of examinee j obtaining a score of h on item /, ”% is the number of 


score categories item / contains, b is the location parameter for item /, d is the category parameter 


for item / for category v, and D is a scaling constant. The generalized partial credit model can be 
constrained to equal the partial credit model by setting the a parameter to 1/D. The generalized 
partial credit model is equivalent to the two-parameter partial credit model used in the 
dimensionality study in the previous section (Yen and Fitzpatrick, 2006). 


The choice of models within a unidimensional structure should take into account various 
considerations, including: 


1. Model simplicity or parsimony. Model selection should balance goodness-of-fit and model 
simplicity. The Rasch model, which is easier to work with than the 2PL/GPC and 3PL/GPC, 
has worked well in many K-12 applications. 

2. Model fit. Because the 3PL/GPC is a more general model, it provides better statistical model 
fit than the 2PL/GPC and the 1PL/PC; the 2PL/GPC provides better fit than 1PL/PC. (Often 
the improvement in fit from 2PL to 3PL can be far smaller than from IPL to 2PL [Haberman, 
2010]). However, statistical model fit, by itself, is not a sufficient basis for model choice. The 
practical implications of model choice should also be considered. For example, for CAT 
administration that aims to deliver items targeted at a specific student’s ability level, fit of 
the IRT curve in the middle range may be more consequential than fit of the curve at the two 
ends. The primary practical implication of model misfit is a systematic difference between 
observed and predicted item characteristic functions, which affects the accuracy of scoring 
(i.e., the relationship of raw scores and trait estimates). Some item properties that affect 
model fit include: 

o  Discriminations that vary systematically by item difficulty or trait level. The Rasch 
model assumes that item discrimination is uncorrelated with item difficulty. By 
examining plots or correlations of item discrimination versus item difficulty for the 
2PL/GPC one can determine if the Rasch assumption is suitable for the Smarter 
tests. This result is also relevant to vertical scaling, since item discriminations for 
the same items administered across grade levels affect the vertical scaling. 

o  Discriminations that vary systematically by item type (SR versus CR), number of 
score categories, or Claim areas. CR items with multiple score levels and/or CR 
scores based on the sum of multiple raters might be expected to have variant 
discrimination and not be adequately represented by the Rasch model (Sykes & 
Yen, 2000; Fitzpatrick, Link, Yen, Burket, Ito, & Sykes, 1996). Again, the results 
of the 2PL/GPC can be examined to see if there is a systematic relationship 
between item type/number of score categories/claim area and item 


discrimination. 
29 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


3. Model stability. Findings from Holland (1990) indicated that unconstrained 3PL models must 
be expected to have stability problems. His study revealed that in the typical case of a 
standard normal prior, a unidimensional IRT model for dichotomous responses can be 
approximated by a log-linear model with only main effects and interactions. For a test of q 
items, the approximation is determined by 2q parameters, while 3PL model would require 3q 
parameters. This stability issue can be readily addressed by having appropriate priors on the 
c parameters, including holding them constant at logical values, particularly wnen sample 
sizes are small. 

4. Reasonableness of the vertical scale. Since the selected IRT model will be used to establish 
a vertical scale, it is important to evaluate the reasonableness of the vertical scale, including 
expected growth from one grade to another, before making final decisions on the model for 
adoption. As suggested by research, the choice of the IRT scaling model may shrink or 
stretch out a measurement scale (Yen, 1981), and will Subsequently impact the 
measurement of growth (Briggs & Weeks, 2009). Both the Rasch and 3PL have been used 
for developing K-12 vertical scales, and in the last two decades their scale properties have 
been broadly accepted by K-12 users (Yen & Fitzpatrick, 2006). 


To support the Consortium in the IRT model selection process, the following results, including 
dimensionality analysis, IRT calibration, fit comparison, guessing evaluation, common discrimination 
evaluation and ability estimates evaluation results, are provided using the data collected in the Pilot 
administration. Both ELA and Math results are described. However, Math PT items are not included 
in the analysis. A considerable portion of the vertical linking items administered to upper grade levels 
show reverse growth patterns, which may be related to common core implementation progress. 
Given these vertical linking item issues, it is not possible to evaluate the reasonableness of the 
vertical scale as part of the model comparison analyses. For this reason, vertical scaling results are 
not provided as part of the model comparison analysis at this time. 


2.1 Data Treatment 


As indicated in the Background section of this memorandum, students took either multiple CAT 
components or a combination of CAT and PT components during the Pilot Test administration. The 
CAT or PT components administered might be on-grade or off-grade to facilitate vertical linking, but 
each participating student was administered at least one on-grade CAT module. PT items were 
included in the ELA IRT model comparison analyses but not in the Math analyses. 


The first step was to create a sparse data matrix reflecting item scores as well as missing 
information by design. For a given grade, the dimension of the sparse matrix is the total number of 
students times the total number of unique items (i.e., scorable units). The remaining cells, 
representing items not administered to this student, have missing information indicated in the 
Sparse matrix and are treated as “not presented” items in the IRT calibration. Data cleaning as 
described below was conducted before calibrations. 


30 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Data Cleaning Before Calibrations 


The following item exclusion rules were followed: 


ltems that have no scored responses, or items that have scored responses in only one 
category were excluded; 

CAT items that have on-grade item total correlations < 0.15 were removed from on-grade 
AND off-grade data sets regardless of their off-grade performance; 

CAT items that have been recommended for “rejection” per content experts during data 
review meetings were removed from on-grade AND off-grade data sets; 

PT items that have negative on-grade item total correlations were removed from on-grade 
AND off-grade data sets; 

CAT or PT items with negative off-grade but reasonable on-grade item-total correlations were 
removed from the specific off-grade data sets only. For dimensionality studies, off-grade 
responses were calibrated together with on-grade responses. 


The following category pre-treatments were followed: 


d. 


Categories that have a reversed pattern of average criterion score progression (i.e., the 
average criterion score for a lower score category was higher than the average criterion score 
for a higher score category) at the on-grade level were collapsed in both on-grade AND off- 
grade data sets; 

Categories with fewer than 10 examinees at on-grade level were collapsed with neighboring 
categories in both on-grade AND off-grade data sets. If the category that needed to be 
collapsed was a middle category, it was collapsed with the neighboring category that had 
lower student counts compared to the other neighboring category; 

Categories that had a reversed pattern of average criterion score progression (i.e., the 
average criterion score for a lower score category was higher than the average criterion score 
for a higher score category) at the off-grade level but not at the on-grade level were collapsed 
in the specific off-grade data sets, 

Categories with fewer than 10 examinees at the off-grade level but 10 or more examinees at 
the on-grade level were collapsed with neighboring categories in the specific off-grade data 
sets. 


ELA and Math items that were dropped or received category pre-treatment before calibrations based 
on the above data cleaning procedure are listed in Tables B.1 and B.2 in Appendix B, respectively. 
These tables are presented separately in the appendix due to their lengths. Of all the items that 
required category collapsing due to Sparse responses, more than 70% of them had fewer than 1,500 
valid responses from the Pilot administration. This result emphasizes the importance of ensuring 
sufficient item-level sample size in the upcoming Field Test administration. The number of CAT/PT 
items that entered into ELA and Math IRT analyses after data cleaning and the examinee sample 
sizes associated with them are presented in Tables 11 and 12. 


SL 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 


Assessment Consortium 
Table 11. Number of ELA Items in IRT Calibration and Examinee Sample Sizes at the Item Level 


No. Items = Col aaliakeromeys lanl Oismel Pe) 


Nol aaliameieciels 


Item Grade 

3 

4 
3 

4 4 
5 

A 

5 5 
6 

5 

6 6 
7 

6 

T T 
8 

T 

8 8 
9 

8 

9 9 
10 

9 

10 10 
11 

9 

11 10 
11 


231 
48 
48 

217 
36 
40 

175 
34 
23 

202 
38 
37 

195 
43 
38 

202 
39 
38 

126 
46 
41 

133 
50 
80 

107 

261 


CAT 
200 
AA 
AA 
179 
35 
36 
144 
31 
23 
161 
36 
35 
163 
41 
36 
168 
39 
35 
SO 
46 
41 
109 
48 
SO 
107 
221 


32 


24 


40 


per CAT Item 
1,281-9,846 
1,377-6,641 
1,101-2,501 
1,466-1,6343 
1,121-3,996 
1,182-2,636 
1,420-18,373 
1,177-4,100 
1,278-4,048 
1,399-12,760 
1,332-4,285 
1,096-3,443 
1,378-12,078 
1,060-3,493 
980-2,133 
1,084-13,077 
492-1,076 
1,197 -3,980 
4,583-5,008 
1,139-1,374 
507-615 
1,382-3,013 
522-1,206 
256-322 
249-320 
1,328-3,729 


per PT Item 
864-5,872 
1,408-1,413 
1,293-1,304 
897-3,518 
1,275 
1,300-1,313 
950-2,797 
1,251-1,303 


929-3,577 
1,828-1,863 
1,/8/7-1,792 
1,066-3,835 
1,/31-1,781 
1,498-1,515 
1,074-3,867 


(42-1,502 
993=/20 


369-527 
549-551 


384-1,710 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 


Table 12. Number of Math Items in IRT Calibration and Examinee Sample Sizes at the Item Level 


PNolaaliameiccrels ltem Grade Co OVA I I eYa ats Examinee Sample Size per CAT Item 
3 207 416-14,735 
: 4 47 1,743-3,582 
3 38 1,917-4,373 
4 4 209 497-9,642 
5 37 1,941-4,355 
4 41 2,129-4,636 
5 5 204 496-10,338 
6 39 2,062-4,607 
5 41 1,838-4,030 
6 6 189 483-9,213 
T 48 1,807-3,939 
6 41 912-1,992 
T T 190 441-11,138 
8 37 952-2,148 
7 33 1,422-3,292 
8 8 191 473-8,556 
9 47 1,416-3,280 
8 23 1,273-2,775 
9 9 103 484-6497 
10 56 1,287-2,826 
9 51 692-1,511 
10 10 122 493-3,889 
11 48 634-1,528 
9 80 561-2,709 
11 10 90 536-2,910 
11 263 422-5,407 


fe 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


2.2 IRT Model Calibration 


IRT calibration is conducted based on 1PL/PC, 2PL/GPC, and 3PL/GPC model combinations using 
PARSCALE (Muraki & Bock, 2003). PARSCALE properties are well known, and a variety of 
unidimensional IRT models can be implemented with it. 


ltem Treatment Rules During Calibration 


ltem treatment was conducted during calibration in situations of non-convergence or unreasonably 
large standard errors for item parameter estimates. Non-convergence was defined by either not 
achieving the criterion of largest parameter change lower than 0.005 or an erratic pattern of -2log 
likelihood values. Standard errors were evaluated against item parameter estimates as part of the 
reasonableness check procedure. Calibration issues in the Pilot analyses were found to be mostly 
caused by the following: 


a. Local item dependence (LID). Some PT items with item ID ending with “a” and “b” (i.e., ID 
values such as “40583a’” and “40583b”) are highly correlated. These items involved the 
same student responses scored with different rubrics. The LID makes these items appear 
highly discriminating, thus causing problems for PARSCALE in locating the slope parameter 
estimates for these items. 

b. Low item discrimination. While CAT items with item-total correlations lower than 0.15 have 
been removed from the pool, there are still some PT items with poor discrimination. Items 
with poor discrimination, especially ones that are difficult, sometimes cause convergence 
issues in 3PL calibration. 

c. Guessing parameter indeterminacy in the 3PL. Guessing parameter starting values may 
cause issues in a 3PL calibration, sometimes leading to large standard errors for difficulty 
estimates (> 1.0) or unreasonable guessing parameter estimates (zero guessing parameter 
estimates associated with standard errors larger than 0.04). 


To address these calibration issues and allow smooth estimation, the following item treatments were 
made at the individual item level, when a specific item was identified as being problematic. 


For SR Items: 


a. Forthe 3PL model, guessing parameter starting values were changed for the item. First the 
guessing parameter starting value was changed to .25, next to .10, and next to O, if 
calibration issues persisted. 

b. Forthe 3PL model, the guessing parameter was held at a fixed value if changing the 
guessing parameter starting value did not solve the calibration issues. The guessing 
parameter was first fixed to .25, next to .10, and next to O if calibration issues persisted. 

c. If none of the above item treatments solved the calibration issue, then the item was 
removed. 


34 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


For CR Items: 


a. Change starting values for the item. For polytomous items, there is an option to use category 
starting values that are constant values from “scores for ordinal or ranked data” instead of 
the PARSCALE default category starting values. 

b. Collapse categories for the polytomous item. 

c. If none of the above item treatments solved a calibration issue, then the item was removed. 
Usually when PARSCLE was having a convergence issue due to LID, one item out of the pair 
that caused LID was removed. 


ltems that received treatment during IRT calibration based on the above-described treatment steps 
are listed in Tables B.3 and B.4 in Appendix B for the ELA and Math tests, respectively. Note that no 
items were deleted from the IPL analyses and a few items were deleted from the 2PL analyses, 
largely due to LID issues. Additional item treatment was made in 3PL analyses due to c-parameter 
estimation issues. Thus, there were some differences in the items included in the following results 
for the three models. 


Under each model combination, the convergence process, IRT parameter estimates as well as 
standard errors associated with them, and item goodness-of-fit analyses results were used to 
evaluate the quality of the resulting item and ability parameter estimates. In general, convergence 
under each IRT model combination was reached and the resulting IRT item/ability parameter 
estimates under each model combination were reasonable. 


2.3 IRT Model Comparisons 

Fit Comparison 

To allow comparison of fit across different IRT model combinations, PARSCALE G2? statistics were 
evaluated. In PARSCALE, a likelihood ratio G2 test statistic can be used to compare the frequencies 


of correct and incorrect responses in the intervals on the 0 continuum with those expected based on 
the fitted model (du Toit, 2003) 


N.-r 
=) loge ——+(WN. -r. )log ——+—‘— 
ae 08. cant” h :) ~"N, (1-P(G)) : 


where ng is the total number of intervals, %, is the observed frequency of correct responses to item j 


in interval h, Nn is the number of examinees in interval h, 0 is the average ability of examinees in 
interval h, and P(61 is the value of the fitted response function for item i at 6, . In PARSCALE, G2 


Statistics are calculated and presented in Phase 2 output. 


Since the G2 statistic tends to be sensitive to sample size (i.e., flagging more items under larger 
sample sizes), it is used as a descriptive statistic in this study instead of a statistic for significance 
35 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 


testing. Since there are many items for any grade/content area combination, the distributions of G2 
are compared across IRT model combinations. Tables 13 and 14 present the summary of G2 
Statistics across 1PL/PC, 2PL/GPC, and 3PL/GPC models for ELA and Math, respectively. Although 
G2 statistics may not be strictly comparable across models due to the difference in degrees of 
freedom, the size of the G2 statistics in general may still provide some evidence for comparing fit 
across models considering that the degree of freedom for each item is roughly comparable across 
different models. The tables show that for most of the tests the mean value of G2 for the 1PL/PC is 
substantially greater than the mean values for the other two model combinations, indicating 
considerable average improvement in fit with 2PL/GPC and 3PL/GPC in comparison with 1PL/PC. 


Table 13. Summary of G2 Statistics of On-Grade ELA Items across 1PL, 2PL, and 3PL IRT Models 


4PL/PC 2PL/GPC 3PL/GPC 

eal Non, G2Mean GSD O° G2Mean G2sp 7°! G2Mean GSD 
3 231 151 114 #234 79 58231. 79 60 
4 217-128 93 216 72 38 «216 70 A1 
5 175 124—~C=i<“‘<«WT*S*<‘«‘CA TL 75 42 174 73 43 
6 202-132 99 197 79 51 197 78 51 
7 195 127 87 190 8A 57 190 8A 58 
8 202 +135 «= 118~—Ss« 199 85 73-199 8A 73 
9 126 103 67 119 72 44119 72 A5 
10 133 93 56129 63 31. 129 62 33 
14 261 79 48-259 57 34 259 57 35 


Table 14. Summary of G2 Statistics of On-Grade Math Items across 1PL, 2PL, and 3PL IRT Models 


4PL/PC 2PL/GPC 3PL/GPC 
aCe hae G?Mean G2SD asian G2 Mean G2SD coe G2 Mean 

3 207. +127 88 207 86 58-207 8A 58 

4 209 ~=—«- 139 99 209 92 82 209 90 84 

5 204 +«4167.~«S««127'—~Ss«2004 95 77-204 93 80 

6 189 145 106 189 96 69 189 93 69 

7 190 162 123 190 113 94 190 110 97 

g 191 152 111 49191 110 86 191 114 99 

9 103.1114 66 103 95 62 103 94 60 

10 122 97 52 122 71 42 122 71 AA 

11 263 72 58 «263 72 88 263 68 74 


36 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 
Guessing Evaluation 


The single-selection SR items in the Pilot Test had four answer choices. Since 1PL and 2PL models 
assume minimal guessing, the amount of guessing involved for SR items is evaluated by examining 
the size of guessing parameter estimates under the 3PL/GPC model combinations. Large guessing 
parameter estimates provide evidence for the use of 3PL models and small guessing parameter 
estimates allow possible use of 1PL and 2PL models. Tables 15 and 16 present the mean, standard 
deviation, minimum, maximum, and range of guessing parameter estimates for items administered 
on-grade for ELA and Math, respectively. Results indicate that the average guessing is below .20 for 
most tests. The range of the guessing values showed a consistent pattern across grade levels in that 
the majority of SR items had guessing parameter estimates below .20 but greater than .10. 


Table 15. Summary of Guessing Parameter Estimates for On-Grade ELA Items 


Porn Geen acmnin Grade No. of c Estimates Summary c Estimates Range 
Items Mean SD Min Max O—0.10 0.10—0.20 0.20—0.30 >0.30 
3 3 76 0.16 0.07 0.06 0.39 16 43 14 3 
4 4 111 0.17 0.07 0.040.36 #20 53 31 7 
5 5 77 0.145 0.07 0.00 0.31 16 40 20 1 
6 6 75 0.145 0.07 0.05 0.33 23 35 14 3 
7 7 76 0.18 0.07 0.06 0.38 9 39 25 3 
8 8 77 0.145 0.07 0.00 0.34 16 46 10 5 
9 9 36 0.16 0.08 0.040.311 10 15 9 2 
10 10 46 0.16 0.08 0.00 0.35 9 24 10 3 
11 11 91 0.18 0.07 0.040.39 12 48 25 6 


Table 16. Summary of Guessing Parameter Estimates for On-Grade Math Items 


Fy ee pty eer No. of c Estimates Summary c Estimates Range 
Items Mean SD Min Max O—0.10 0.10—0.20 0.20—0.30 >0.30 
3 3 34 0.18 0.07 0.05 0.36 3 21 8 2 
4 4 31 0.17 0.06 0.03 0.29 3 18 10 O 
5 5 39 0.18 0.10 0.02 0.43 £13 10 11 5 
6 6 41 0.21 0.09 0.08 0.38 5 14 13 9 
T i 31 0.20 0.08 0.07 0.39 3 12 13 3 
8 8 34 0.18 0.07 0.07 0.32 3 18 10 3 
9 9 14 0.20 0.08 0.09 0.35 1 8 3 2 
10 10 19 0.26 0.11 0.06 0.46 2 3 8 6 
11 11 32 0.19 0.08 0.05 0.37 4 15 9 4 


37 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 


Common Discrimination Evaluation 


The Rasch model assumes a common item discrimination across all items. Analyses were conducted 
to evaluate if item discrimination varied systematically by item difficulty, item type (SR vs. CR), 
number of item score categories, or item claim areas. This evaluation was done by plotting item 
discrimination versus item difficulty estimates from the 2PL/GPC model calibrations across all items 
within each grade level, with items of different types (i.e., SR vs CR, items with different numbers of 
score categories, items in different claim areas) highlighted. When the distribution of item 
discrimination is reasonably homogeneous, the selection of a model that assumes equal item 
discrimination may be viable. 


Tables 17 and 18 summarize discrimination and difficulty parameter estimates and correlations 
between them under the 2PL/GPC for ELA and Math items administered on-grade. These summary 
Statistics are provided for the overall set of items as well as groups of items characterized by item 
type (SR/CR), score categories (number of discrete possible score points), and claim areas. Figures 
B.1 and B.23 in Appendix B present, for ELA and Math and at each grade level, plots of ttem 
discrimination versus item difficulty under the 2PL/GPC with item type, score category, and claim 
area highlighted for each item. Results show that for the 2PL/GPC model there is moderate negative 
correlation between item difficulty and discrimination for ELA. There is less evidence for neither 
positive nor negative correlation between item difficulty and discrimination for Math items. 


Tables 17 and 18 also show sizable standard deviations for discrimination parameter estimates, 
above 0.20 for all Subjects and grade levels, which indicate a substantially wide range of 
discrimination parameter estimates for the items in the pool. The average discriminations vary 
somewhat, but not considerably, across item groupings. The CR items were slightly more 
discriminating on average than SR items. The pattern of item discrimination across different 
numbers of score categories was inconsistent across subjects. For ELA, items with 2 and 3 score 
categories had comparable discrimination, while items with 4 score categories generally had higher 
average discrimination (which might be due to local item dependence issues for PT items). For Math, 
the fewer the number of score categories, the higher the item discrimination. ELA items in Claim 
areas 2 and 4 had slightly higher average discriminations than items in claim areas 1 and 3 for most 
of the grade levels. Math items did not show a noticeable pattern of differential discrimination across 
different Claim areas. 


An advantage of the 2PL/GPC in comparison to the 1PL/PC is that it would permit using items with a 
range of item discriminations, while the 1PL/PC could flag items with both very high and very low 
discriminations for exhibiting poor fit and requiring further content review. 


3These plots have inconsistent ranges because the range may be unnecessarily wide for most of the 
tests as a result of the fewer tests having quite extreme difficulty and discrimination values. When 
the range is wide, the scatter points would all be in the middle and it will be difficult to identify any 
patterns if they exist. 

38 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Table 17. Summary of 2PL/GPC Slope and Difficulty Estimates and Correlations for On-Grade ELA 
ltems 


ltem Item Grouping No.of aEstimates Summary b Estimates Summary eWyale b 
Grade Items Mean SD Min Max Mean SD Min Max Correlation 
Overall 231 0.63 0.23 0.15 1.24 0.32 1.22 -1.87 5.00 0.29 
‘itemType S® 76 0.64 0.25 0.16 123  -0.44 1.09 -1.87 5.00 -0.64 
CR 155 0.62 0.22 0.15 1.24 0.69 1.11 -1.80 4.35 0.12 
i 2 134 0.65 0.24 0.16 1.23 0.08 1.27 -1.87 5.00 0.39 
3 Categories 3 91 0.56 0.20 0.15 109 061 1.09 1.80 3.38 0.18 
4 6 1.04 0.16 0.86 1.24 1.39 0.14 1.22 1.60 0.39 
1 85 0.63 0.23 018 112 010112 -1.84 3.14 0.51 
Seiad. 2 64 0.68 0.26 0.18 1.24 0.33 0.99 -1.25 2.98 0.03 
3 44 060 0.21 0.15 1.06 -0.22 0.96 -1.87 2.23 0.24 
4 38 0.57 0.22 0.16 1.06 1.44 1.40 -1.80 5.00 0.41 
Overall 216 0.57 0.23 0.20 140 0.33 1.21 -1.93 4.14 0.15 
temType SR 111 0.54 0.21 0.20 124 -0.32 0.89 -193 2.18 0.59 
CR 105 0.61 0.24 0.20 1.40 1.01 1.13 -1.28 4.14 0.06 
2 148 0.56 0.21 0.20 1.24 0.00 1.12 -1.93 3.54 0.30 
; sepaiten 3 59 ~=-0.53.: 0.21 (0.20 1.26 0.97 1.16 -1.28 4.14 0.11 
4 9 1.02 0.25 0.73 1.40 1.48 0.44 1.01 2.00 0.91 
1 78 0.58 0.22 0.20 1.24 -0.16 1.02 -1.85 2.48 0.48 
| 2 58 0.62 0.25 0.27 1.40 0.42 1.07 -1.93 2.71 0.04 
ClaimArea 3400.49 (0.17 0.22 0.83 -0.05 0.91 -1.55 2.54 0.21 
4 40 0.57 0.24 0.20 1.26 1.51 1.20 -0.89 4.14 0.10 
Overall 171 0.61 0.20 0.19 1.15 0.34 1.21 -2.14 3.38 0.16 
temtyoe SR 77 0.57 0.21 0.19 1.05 0.46 0.84 -2.14 1.87 0.53 
CR 94 0.63 0.18 0.20 1.15 1.00 1.06 -1.06 3.38 0.16 
2 115 0.59 0.19 0.19 1.05 -0.01 1.15 -2.14 3.38 0.25 
: havea 3 50 0.61 0.19 0.20 1.12 1.01 1.02 -1.01 2.96 0.16 
4 6 0.80 0.26 0.57 1.15 1.51 0.63 0.80 2.14 0.80 
1 55 0.56 0.18 0.19 0.92 0.21 1.15 -1.98 2.90 0.20 
| 2 51 0.62 0.20 0.27 1.15 0.39 1.05 -1.74 2.75 0.07 
Claim Area 3 300.62 0.20 0.28 1.05 -0.51 0.84 -2.14 1.42 0.59 
4 33 0.65 0.21 0.20 1.12 1.31 118 -1.13 3.38 0.18 
Overall 197 0.58 0.28 0.17 2.06 0.65 1.48 -1.79 8.05 0.10 
temtype SR 75 (0.51 0.20 0.17 1.01 0.31 0.98 1.79 2.65 0.54 
CR 122 0.63 0.31 0.19 2.06 1.23 1.44 -1.26 8.05 0.16 
: score 2 128 0.58 0.25 017 1.34 041 1.47 -1.79 5.29 0.11 
ices 66 0.58 0.29 0.19 2.06 1.06 1.44 -1.26 8.05 0.15 
4 3 1.09 0.61 0.59 1.77 1.59 0.24 140 1.86 0.46 
Fg ol 77 0.55 0.19 0.19 1.04 0.52 1.27 -1.79 3.54 0.41 
2 56 0.61 0.35 0.18 2.06 0.54 1.32 -1.34 4.80 0.02 


39 


Item 
CT gslels, 


10 


Item Grouping 


Overall 


ltem Type 


score 
Categories 


Claim Area 


Overall 


Item Type 


score 
Categories 


Claim Area 


Overall 


Item Type 


score 
Categories 


Claim Area 


Overall 


ltem Type 


Score 
Categories 


Claim Area 


No. of 
Items 


119 


129 


PILOT ANALYSIS SUMMARY OF RESULTS 


a Estimates Summary 


Mean 
0.52 
0.68 
0.53 
0.52 
0.53 
0.55 
0.49 
0.61 
0.52 
0.52 
0.47 
0.61 
0.56 
0.50 
0.59 
0.56 
0.49 
1.24 
0.49 
0.64 
0.47 
0.69 
0.60 
0.54 
0.63 
0.58 
0.60 
0.87 
0.58 
0.65 
0.49 
0.67 
0.60 
0.56 
0.63 
0.61 
0.57 
1.00 
0.55 


SD 

0.17 
0.34 
0.21 
0.24 
0.19 
0.22 
0.19 
0.08 
0.20 
0.16 
0.23 
0.23 
0.27 
0.20 
0.30 
0.24 
0.24 
0.35 
0.17 
0.35 
0.21 
0.30 
0.24 
0.20 
0.25 
0.23 
0.26 
0.15 
0.27 
0.20 
0.19 
0.23 
0.25 
0.25 
0.24 
0.24 
0.24 
0.30 
0.20 


Min 

0.17 
0.19 
0.11 
0.19 
0.11 
0.19 
0.11 
0.51 
0.12 
0.21 
0.19 
0.11 
0.08 
0.08 
0.13 
0.08 
0.13 
0.69 
0.13 
0.18 
0.17 
0.08 
0.20 
0.22 
0.20 
0.20 
0.21 
0.73 
0.20 
0.29 
0.28 
0.22 
0.19 
0.22 
0.19 
0.22 
0.19 
0.73 
0.19 


40 


Max 

0.85 
1.34 
1.18 
1.18 
1.14 
1.18 
1.07 
0.72 
1.18 
0.96 
1.06 
1.14 
1.58 
1.02 
1.58 
1.26 
1.25 
1.58 
0.90 
1.58 
1.02 
1.26 
1.20 
0.99 
1.20 
1.08 
1.20 
1.06 
1.20 
1.00 
0.99 
1.10 
1:33 
1.11 
1.33 
1.12 
1.32 
1.33 
1.05 


b Estimates Summary 


Mean 


-0.38 
1.93 
O57 

-0.13 
1.04 
0.26 
1.00 
1.68 
0.47 
0.40 
0.29 
1.42 
0.53 

-0.11 
0.93 
0.23 
0.93 
1.49 
0.38 
0.36 
0.44 
1.21 
0.64 

-0.43 
1.10 
0.38 
0.89 
1.45 
0.54 
0.46 

-0.19 
1.55 
0.75 

-0.10 
1.23 
0:53 
1.02 
1.53 
0.74 


SD 

0.96 
1.69 
1.34 
1.10 
1.29 
1.38 
1.15 
0.38 
1.41 
1.23 
1.13 
1.31 
1.21 
0.98 
ses Ig 
a eg 
i a 
0.21 
1.40 
1.02 
1.16 
0.82 
1.33 
0.78 
1.25 
1.36 
1.28 
0.22 
1.46 
1.11 
1.08 
0.86 
1.26 
0.92 
a Des Bg 
1.40 
1.01 
0.22 
1.40 


Min 
-1.74 
-0.29 
-2.25 
-2.25 
-1.76 
-2.25 
-1.32 
1.24 
-2.21 
-2.25 
-1.90 
-0.25 
-2.87 
-2.01 
-2.87 
-2.01 
-2.87 
1.30 
-2.01 
-2.87 
-1.78 
-0.53 
-2.24 
-1.60 
-2.24 
-1.60 
-2.24 
1.25 
-2.24 
-1.60 
-1.23 
-0.30 
-1.78 
-1.78 
-0.76 
-1.78 
-0.76 
1.28 
-1.78 


Max 

2.65 
8.05 
6.61 
3.29 
6.61 
5.81 
6.61 
2.10 
5.81 
2.71 
3.29 
6.61 
6.17 
2.53 
6.17 
6.17 
4.47 
1.83 
6.17 
2.18 
2.95 
3.30 
6.04 
1.21 
6.04 
3.54 
6.04 
1.74 
6.04 
2.61 
2.12 
3.30 
4.70 
2.78 
4.70 
4.70 
3.25 
1.71 
4.70 


aand b 
Correlation 


-0.46 
-0.23 
-0.30 
-0.56 
-0.18 
-0.32 
-0.23 

0.26 
-0.38 
-0.14 
-0.60 
-0.38 
-0.12 
-0.50 
-0.11 
-0.17 
-0.19 
-0.26 
-0.36 

0.19 
-0.61 
-0.29 

0.01 
-0.51 
-0.03 

0.01 
-0.06 
-0.44 
-0.13 

0.07 
-0.46 

0.24 
-0.18 
-0.55 
-0.16 
-0.21 
-0.17 
-0.99 
-0.28 


PILOT ANALYSIS SUMMARY OF RESULTS 


a Estimates Summary 
Mean SD _ Min 


b Estimates Summary 
SD Min Max 


aand b 
Correlation 


Item 
CT gslels, 


it) game icolel e)| ayes 


Overall 


Item Type 


Score 


11 Categories 


Claim Area 


BWN FIRB WD 


0.73 
0.52 
0.65 
0.54 
0.49 
0.57 
0.54 
0.52 
0.89 
0.47 
0.65 
0.48 
0.65 


0.28 
0.25 
0.26 
0.22 
0.17 
0.23 
0.20 
0.22 
0.28 
0.20 
0.24 
0.16 
0.20 


0.22 
0.21 
0.23 
0.18 
0.19 
0.18 
0.19 
0.18 
0.69 
0.19 
0.26 
0.18 
0.26 


Max Mean 

1.33 0.90 
1.11 0.00 
1.30 1.32 
1.32 1.01 
0.91 0.24 
1.32 1.43 
1.21 0.75 
1.18 1.34 
1.32 1.36 
1.149 1.20 
1.32 0.65 
0.86 0.72 
1.21 1.49 


1.18 
0.84 
0.96 
1.20 
0.89 
1.14 
1.26 
dO, 
0.10 
1.26 
1.00 
sa Ig 
1.13 


-1.34 
-1.21 
-0.16 
-1.97 
-1.97 
-1.29 
-1.97 
-0.68 

1.26 
-1.97 
-1.25 
-1.29 
-0.52 


3.92 
1.91 
2./8 
5.09 
2.85 
5.09 
5.09 
4.71 
1.50 
4.83 
2.92 
4.71 
5.09 


-0.15 
-0.72 
-0.14 
-0.15 
-0.55 
-0.18 
-0.09 
-0.27 
-0.27 
-0.22 

0.04 
-0.46 
-0.02 


Table 18. Summary of 2PL/GPC Slope and Difficulty Estimates and Correlations for On-Grade Math 


Items 
Jeet es)pameraelele)ia 
CT gelels, ol 
Overall 
ltem Type 
Score 


3 Categories 


Claim Area 


Overall 


Item Type 


score 
4 Categories 


Claim Area 


5 Overall 


Toone) j 


Items 


a Estimates Summary 


Kereta 


0.69 
0.65 
0.69 
0.75 
0.61 
0.50 
0.71 
0.65 
0.60 
0.60 
0.7/2 
0.63 
0:13 
0.78 
0.59 
0.50 
0./2 
0.70 
0./2 
0.70 


0.71 


SD 
0.21 
0.23 
0.21 
0.21 
0.16 
0.18 
0.21 
0.18 
0.19 
0.21 
0.25 
0.25 
0.25 
0.25 
0.17 
0.14 
0.24 
0.28 
0.31 
0.20 


0.26 


Min 
0.21 
0.21 
0.21 
0.21 
0.32 
0.21 
0.21 
0.37 
0.21 
0.28 
0.19 
0.19 
0.27 
0.19 
0.28 
0.28 
0.24 
0.19 
0.28 
0.50 
0.23 

AL 


b Estimates Summary 


Max Mean 
1.314 0.31 
1.21 -0.81 
1.31 0.53 
1.31 0.18 
0.98 0.45 
0.77 O.77 
1.31 0.20 
1.22 0.50 
0.86 1.00 
0.95 0.37 
1.32 0.72 
1.10 -0.09 
1.32 0.86 
1.32 0.70 
1.09 0.64 
0.77 1.32 
1.32 0.54 
1.22 1.23 
1.26 1.36 
1.08 1.41 
1.38 0.55 


SD 
1.43 
1.25 
1.36 
1.50 
1.28 
1.36 
1.43 
1.15 
1.79 
1.08 
1.20 
1.36 
1.12 
1.28 
1.03 
0.82 
1.23 
0.91 
0.98 
0.62 


1.10 


Min 
-4.06 
-4.06 
-3.16 
-4.06 
-2.77 
-1.68 
-4.06 
-1.26 
-2.77 
-1.68 
-3.42 
-1.91 
-3.42 
-3.42 
-1.66 

0.05 
-3.42 
-0.10 

0.01 

0.40 


-3.34 


Max 

4.42 
1.84 
4.42 
3.63 
4.42 
3.20 
3.63 
3.53 
4.42 
1.55 
3.97 
3.86 
3.97 
3.97 
2.45 
2.46 
3.58 
3.86 
3.97 
2.44 


4.17 


aand b 
Correlation 


0.01 
-0.33 
0.04 
0.05 
0.08 
0.28 
0.04 
0.02 
-0.16 
0.71 
0.01 
-0.58 
0.08 
-0.02 
0.30 
0.10 
0.09 
-0.30 
-0.26 
0.30 


0.17 


PILOT ANALYSIS SUMMARY OF RESULTS 


No. of 
Items 


a Estimates Summary 
Mean SD Min Max 


b Estimates Summary 
Mean SD Min Max 


aand b 
Correlation 


Item 
CT gslels, 


Item Grouping 


Item Type 


score 
Categories 


Claim Area 


Overall 


ltem Type 


score 
Categories 


Claim Area 


Overall 


Item Type 


score 
Categories 


Claim Area 


Overall 


Item Type 


score 
Categories 


Claim Area 


0.62 
0.73 
0.76 
0.60 
0.47 
0.71 
0.76 
0.56 
0.77 
0.70 
0:55 
0.74 
0.75 
0.61 
0.43 
0.68 
0.74 
0.63 
0.97 
0.66 
0.46 
0.70 
0.73 
0.60 
0.50 
0.67 
0.74 
0.54 
0.70 
0.65 
0.48 
0.69 
0.70 
0.57 
0.50 
0.63 
0.74 
0.65 
0./2 


0.21 
0.27 
Ozr 
0.20 
0.09 
0.25 
0.28 
0.24 
0.33 
0.27 
0.21 
0.27 
0.28 
0.18 
0.12 
0.27 
0.21 
0.26 
0.32 
0.26 
0.15 
0.26 
0.27 
0.22 
0.21 
0.26 
0.22 
0.26 
0.16 
0.27 
0.17 
0.28 
0.30 
0.20 
0.16 
0.27 
0.31 
0.16 
0.25 


0.23 
0.27 
0.23 
O27 
0.30 
0.23 
0.38 
0.30 
0.34 
0.19 
0.19 
0.20 
0.19 
0.20 
0.32 
0.19 
0.41 
0.30 
0.63 
0.15 
0.23 
0.15 
0.23 
0.15 
0.21 
0.15 
0.27 
0.21 
0.51 
0.13 
0.20 
0.13 
0.18 
0.13 
0.28 
0.13 
0.34 
0.48 
0.45 


42 


1.13 
1.38 
1.38 
dW 
0.58 
1.31 
1.38 
1.09 
1.43 
1.58 
1.10 
1.58 
1.58 
0.99 
0.64 
1.58 
LA7 
1.10 
1.50 
1.43 
0.91 
1.43 
1.43 
1.15 
0.96 
1.43 
1.17 
1.06 
0.96 
1.47 
0.76 
1.47 
1.47 
1.07 
0.82 
1.45 
1.47 
0.88 
1.04 


-O.11 
0.70 
0.47 
0.71 
0.82 
0.43 
1.04 
0.69 
1.00 
0.95 
0.21 
1.16 
0.94 
1.02 
0.74 
0.84 
1.03 
1.84 
1.84 
1.38 
0.82 
1.49 
1.38 
1.40 
1.25 
1.41 
0.90 
1.65 
1.44 
129 
0.79 
1.35 
1.30 
1.02 
1.40 
1.20 
1.58 
1.04 
1.50 


0.73 
1.12 
1.15 
0.99 
0.70 
1.12 
0.99 
0.85 
1.03 
1.19 
147 
1.12 
1.28 
0.99 
0.80 
421 
1.13 
0.13 
0.75 
1.19 
1.12 
1.18 
1.03 
1.40 
1.11 
1.17 
0.83 
1.65 
1.02 
Bes 
1.09 
1.16 
1.22 
1.06 
0.96 
1.17 
1.26 
1.00 
0.42 


-1.83 
-3.34 
-3.34 
-1.29 
-0.26 
-3.34 
-0.70 
-1.29 
-0.26 
-1.77 
-1./7/ 
-1.54 
-1.77 
-0.78 
-0.36 
-1./7 
-1.19 

0.63 

0.67 
-1.81 
-1.81 
-1.02 
-1.81 
-1.02 
-0.62 
-1.81 
-0.87 
-0.92 

0.29 
-1.49 
-0.99 
-1.49 
-1.20 
-1.49 

0.12 
-1.49 
-0.97 
-1.01 

0.97 


1./2 
4.17 
3.43 
4.17 
1.68 
4.17 
3.43 
1.7/2 
2.33 
4.09 
2.98 
4.09 
4.09 
3.69 
2.07 
4.09 
3.05 
3.10 
2.9/ 
6.38 
3.84 
6.38 
4.20 
6.38 
3.89 
6.38 
2.59 
5.46 
2.56 
DilZ 
4.54 
5:12 
5.12 
4.95 
3.04 
5.12 
5.12 
2.49 
1.85 


0.00 
0.14 
0.25 
0.04 
0.48 
0.19 
-0.01 
-0.15 
0.68 
0.01 
-0.55 
0.00 
0.05 
-0.23 
0.03 
-0.08 
0.48 
-0.05 
0.33 
-0.11 
-0.68 
-0.16 
0.11 
-0.45 
0.07 
-0.09 
0.22 
-0.33 
0.14 
-0.08 
-0.60 
-0.09 
-0.13 
-0.02 
-0.62 
-0.11 
-0.05 
-0.15 
-0.18 


Item 
CT gslels, 


10 


11 


Item Grouping 


Overall 


ltem Type 


Score 
Categories 


Claim Area 


Overall 


Item Type 


Score 
Categories 


Claim Area 


Overall 


Item Type 


score 
Categories 


Claim Area 


No. of 
Items 


PILOT ANALYSIS SUMMARY OF RESULTS 


a Estimates Summary 


WKexel a 
0.60 
0.46 
0.62 
0.68 
0.50 
0.33 
0.62 
0.47 
0.51 
0.52 
0.67 
0.48 
0.71 
0.81 
0.53 
0.37 
0.69 
0.67 
0.50 
0.59 
0.84 
0.45 
0.90 
0.90 
0.62 
0.59 
0.84 
0.85 
0.83 
0.92 


SD 
0.27 
0.20 
0.28 
0.28 
0.21 
0.09 
0.28 
0.22 
0.18 
0.08 
0.36 
0.22 
0.37 
0.40 
0.19 
OAT 
0.38 
0.22 
0.32 
0.29 
0.39 
0.21 
0.38 
0.40 
0.23 
0.24 
0.38 
0.49 
0.33 
0.54 


Min 
0.15 
0.21 
0.15 
0.21 
0.15 
0.23 
0.15 
0.20 
0.24 
0.47 
0.17 
0.18 
0.17 
0.18 
0.17 
0.17 
0.17 
0.26 
0.17 
0.36 
0.21 
0.21 
0.22 
0.21 
0.22 
0.35 
0.21 
0.24 
0.33 
0.34 


43 


Max 

1.42 
0.77 
1.42 
1.42 
1.02 
0.44 
1.42 
0.77 
0.69 
0.58 
1.76 
1.12 
1.76 
1.76 
0.91 
0.75 
1.76 
1.10 
1.33 
1.09 
2.20 
1.22 
2.20 
2.20 
1.19 
1.01 
2.18 
2.20 
1.68 
2.07 


b Estimates Summary 


Mean 
1.92 
0.99 
2.07 
1.89 
1.94 
2.13 
1.93 
1.77 
2.14 
17. 
1.32 
0.88 
1.40 
1.40 
1.22 
1.18 
1.13 
1.80 
1.99 
2.35 
2.18 
1.05 
2.33 
2.28 
1.78 
1.45 
2.09 
2.81 
1.74 
2.96 


SD 
eae | 
1.04 
1.25 
1.31 
1.18 
1.54 
1.31 
1.36 
0.84 
0.30 
1.10 
1.48 
1.00 
1.15 
1.00 
1.16 
1.08 
0.69 
1.06 
1.05 
1.29 
L2t 
1.21 
1.29 
1.19 
1.15 
1.31 
1.20 
0.98 
0.79 


Min 
-0.62 
-0.29 
-0.62 
-0.62 

0.29 
-0.44 
-0.62 
-0.44 

1.02 

1.50 
-1.15 
-0.71 
-1.15 
-0.71 
-1.15 
-0.36 
-1.15 
-0.02 

0.37 

1.27 
-1.11 
-1.06 
-1.11 
-1.06 
-0.87 
-1.11 
-1.11 
-0.53 

0.26 

1.61 


Max 

1.34 
3.76 
1.34 
1.34 
6.10 
3.80 
1.34 
4.21 
3.24 
1.93 
5.49 
5.49 
3.84 
5.49 
3.67 
3.235 
5.49 
2.54 
3.84 
3.67 
5.48 
3.63 
5.48 
5.48 
4.45 
2.54. 
5.48 
4.90 
4.10 
4.18 


aand b 
Correlation 


0.00 
-0.62 
-0.01 

0.09 
-0.14 
-0.30 
-0.02 

0.18 
-0.11 
-1.00 

0.12 
-0.35 

0.15 

0.06 

0.19 

0.22 

0.21 

0.03 
-0.19 

0.03 
-0.04 
-0.43 
-0.17 
-0.10 
-0.16 

0.20 
-0.06 
-0.04 
-0.03 

0.25 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Ability Estimates Evaluation 


It is worthwhile to see how the ability estimates and scales vary among all three model combinations. 
It is reasonable to expect that the correlations of ability estimates will be very high across models, 
because for a given examinee, the same item responses are used for all three ability estimates; all 
that differs is 1) how the item responses are weighted, and 2) how the ability scales differ in terms of 
“stretching out” or “pushing in” various parts of the ability scale. 


Tables 19 and 20 summarize means and standard deviations of theta estimates and their 
correlations across different model combinations for ELA and Math, respectively. Figures B.3 and B.4 
present scatter plots of theta estimates under different model choices for ELA and Math, 
respectively. Results show that the ability estimates across all three models are highly correlated. 
The scatter plots show that 2PL/GPC produced ability estimates that were most similar to the 
3PL/GPC in the middle of the ability scale. Despite the difference between item parameter estimates 
produced by the 1PL/PC and the 3PL/GPC, the ability scale produced by the 1PL/PC is very similar to 
that produced by 3PL/GPC, and the two ability scales exhibit a linear relationship. 


4The three models do produce different scales when applied to multiple-choice data where it is 
possible for very low ability students to correctly guess the keyed answer (Yen, 1981). 


44 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 


Assessment Consortium 
Table 19. ELA Correlations of Ability Estimates across Different Model Combinations 


Theta Summary MNalsec Mm @xe)acsit-1ele) ars 


Vere (=) 


10 


11 


1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 


Mean 


-0.02 
-0.01 
-0.01 
-0.01 
0.01 
0.01 
-0.01 
0.00 
0.00 
-0.01 
0.00 
-0.01 
-0.01 
0.01 
-0.01 
-0.01 
0.00 
0.00 
-0.01 
0.00 
-0.01 
-0.02 
0.00 
0.00 
-0.02 
-0.03 
-0.04 


SD 
1.10 
1.10 
1.10 
1.13 
1.14 
1.14 
1.14 
1.16 
1.18 
1.16 
1.18 
1.19 
1.16 
1.19 
1.19 
sO Bg 
1.19 
1.20 
a Oe Os 
1.20 
1.21 
1.15 
1.15 
1.15 
1.12 
1.14 
1.15 


45 


1PL/PC 
1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


2PL/GPC 
0.99 
1.00 


0.98 
1.00 


0.98 
1.00 


0.98 
1.00 


0.97 
1.00 


0.98 
1.00 


O9f 
1.00 


0.98 
1.00 


0.98 
1.00 


3PL/GPC 
0.98 
0.99 
1.00 
0.97 
0.99 
1.00 
0.97 
0.98 
1.00 
0.97 
0.99 
1.00 
0.95 
0.98 
1.00 
0.97 
0.99 
1.00 
0.96 
0.99 
1.00 
0.97 
0.99 
1.00 
0.97 
0.98 
1.00 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 


Assessment Consortium 
Table 20. Math Correlations of Ability Estimates across Different Model Combinations 


Theta Summary Malsiec Mm Oxe)acsit-1ele) ats 


Vere l=) 


10 


11 


1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 
1PL/PC 
2PL/GPC 
3PL/GPC 


Mean 


-0.01 
-0.03 
-0.03 
-0.01 
-0.04 
-0.06 
-0.02 
-0.04 
-0.05 
0.01 
-0.01 
0.00 
0.00 
-0.02 
-0.05 
0.01 
-0.01 
-0.01 
0.00 
-0.07 
-0.15 
-0.02 
-0.09 
-0.27 
0.06 
-0.08 
-0.08 


SD 
1.10 
1.11 
1.11 
1.07 
1.09 
1.06 
1.09 
1.11 
1.11 
1.09 
1.11 
1.09 
1.09 
1.11 
1.06 
1.09 
1.12 
1.11 
1.14 
1.16 
1.13 
1.14 
1.13 
1.03 
1.01 
1.01 
0.95 


46 


1PL/PC 
1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


2PL/GPC 
0.99 
1.00 


0.99 
1.00 


0.99 
1.00 


0.98 
1.00 


0.98 
1.00 


0.97 
1.00 


0.95 
1.00 


0.97 
1.00 


0.95 
1.00 


3PL/GPC 
0.98 
1.00 
1.00 
0.98 
0.99 
1.00 
0.97 
0.99 
1.00 
0.97 
0.99 
1.00 
0.96 
0.98 
1.00 
0.96 
0.99 
1.00 
0.92 
0.96 
1.00 
0.93 
0.97 
1.00 
0.92 
0.98 
1.00 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


2.4 IRT Model Recommendation 


Based on the model comparison analysis results, ETS recommends that the 2PL/GPC model be 
adopted as the IRT model combination for calibrating Smarter Balanced items and establishing a 
vertical scale. The 2PL/GPC model provides flexibility for estimating a range of item discriminations 
without the complications of implementing a 3PL/GPC model. The major limitation of the 2PL/GPC 
model in this setting is that it has not been previously used for vertical scaling in K-12 assessments. 


This recommendation should be evaluated with caution given the experimental nature of the Pilot 


data, the possible change of item format from Pilot to Field Test to operational administration, and 
the lack of information about vertical scaling results for the three models. 


47 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


References 


Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In (Eds.) 
Petrov, B.N. & Csaki. F. Proceedings 2nd International Symposium Information Theory, 
(pp.26/-281), Budapest, Hungary: Akademia Kiado. 


Bock, R. D. & Zimowski, M. F. (1997). Multiple Group IRT. In W.J. van der Linden and R. K. Hambleton 
(Ed.), Handbook of Modern Item Response Theory (pp. 433-448). New York: Springer-Verlag. 


Briggs, D. C., & Weeks, J. P. (2009). The sensitivity of value-added modeling to the creation of a 
vertical score scale. Education Finance and Policy, 4(4), 384-414. 


Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female 
candidates on SAT and TSWE forms administered in December 1977: An application of the 
standardization approach (RR-83-9). Princeton, NJ: Educational Testing Service. 


Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to 
assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal 
of Educational Measurement, 23, 355-368. 


Dorans, N. J., & Schmitt, A. P. (1991). Constructed response and differential item functioning: A 
pragmatic approach (RR-91-47). Princeton, NJ: Educational Testing Service. 


Dorans, N. J., & Schmitt, A. P. (1993). Constructed response and differential item functioning: A 
pragmatic perspective. In R. E. Bennett & W. C. Ward (Eds.), Construction versus choice in 
cognitive measurement (pp. 135-165). Hillsdale NJ: Erlbaum. 


Fitzpatrick, A. R.; Link, V.; Yen, W. M.; Burket, G. R.; Ito, K.; & Sykes, R. (1996). Scaling performance 
assessments: A comparison of one-parameter and two-parameter Partial Credit Models. 
Journal of Educational Measurement, 33, 291-314. 


Holland, P. W., & Thayer, D. (1988). Differential item performance and the Mantel-Haenszel 
procedure. In H. Wainer & H. |. Braun (Eds.), Test Validity (pp. 129-145). Hillsdale, NJ: 
Lawrence Erlbaum Associates. 


Mantel, N., & Haenszel, W. M. (1959). Statistical aspects of the analysis of data from retrospective 
studies of disease. Journal of the National Cancer Institute, 22, 719-748. 


Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174. 


McKinley, R. L., & Reckase, M. D. (1983). An application of a multidimensional extension of the two- 
parameter logistic latent trait model (ONR-83-3). (ERIC Document Reproduction Service No. 
ED 240 168) 


Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied 
Psychological Measurement, 16, 159-176. 


48 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Reckase, M. D.; Martineau, J. A.; & Kim, J.P. (2000, July). A vector approach to determining the 
number of dimensions needed to represent a set of variables. Paper presented at the annual 
meeting of the Psychometric Society, Vancouver, Canada. 


Reckase, M. D.; Ackerman, T.A.; & Carlson, J.E. (1988). Building unidimensional tests using 
multidimensional items. Journal of Educational Measurement, 25, 193-203. 


Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464. 


Sykes, R. C., and Yen, W. M. (2000). The scaling of mixed-item format tests with the one-parameter 
and two-parameter partial credit models. Journal of Educational Measurement, 37(3), 221- 
244. 


Weeks. J. P. (2010). Plink: An R Package for Linking Mixed-Format Tests Using IRT-Based Methods. 
Journal of Statistical Software, 35(12), 1-33. URL http://www.jstatsoft.org/v35/i12/, 


Yao, L. (2003). BMIRT: Bayesian multivariate item response theory. [Computer software]. Monterey, 
CA: Defense Manpower Data Center. 


Yao, L., & Schwarz, R.D. (2006). A multidimensional partial credit model with associated item and 
test statistics: an application to mixed-format tests. Applied Psychological Measurement, 30, 
469-492. 


Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological 
Measurement, 5, 245-262. 


Yen, W. M. (1993). Scaling performance assessments: strategies for managing local item 
dependence. Journal of Educational Measurement, 30, 187-214. 


Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R.L. Brennan (eds.), Educational 
Measurement (Fourth Edition), Westport, CT: American Council on Education and Praeger 
Publishing. 


Zwick, R.; Donoghue, J. R.; & Grima, A. (1993). Assessment of differential item functioning for 
performance tasks. Journal of Educational Measurement, 30(3), 233-251. 


49 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 


Appendix A: Item Vector Plots 


Figure A.1. Item Vector Plot for ELA Grade 3 (Within Grade) 


m0 05 10 15 20 25 30 35 40 45 50 55 6.0 


2.0 -1.5 egy f\0 05 10 15 20 25 30 35 40 45 5.0 
Vif 


Figure A.2. Item Vector Plot for ELA Grade 4 (Within Grade) 


50 


PILOT ANALYSIS SUMMARY OF RESULTS 


° 7 
0, 
E TT ?T?TTd?TtdT?Ttt TTT 
-3.0 -1. 0 10 20 30 40 50 60 7.0 80 9.0 105 12.0 
Figure A.3. Item Vector Plot for ELA Grade 5 (Within Grade) 
+ 0 
a 4 
_ Z Z 
<7 a 
=o ? 
2 0, 
I ] ] l T ] ] 
-4.0 6.0 7.0 8.0 
7 a 
q | 


Figure A.4. Item Vector Plot for ELA Grade 6 (Within Grade) 


51 


PILOT ANALYSIS SUMMARY OF RESULTS 


92 


a 7 
x z 
e an 
ff, 
0 0, 
ay Sot hia it) ar rhe ie 4 
-2.0 i, 1.0 20 30 40 50 60 7.0 80 9.0 105 12.00 13.5 
i} 
i 
Figure A.5. Item Vector Plot for ELA Grade 7 (Within Grade) 
o x 7 
4 4 
= 0, 


30 40 50 60 70 80 9.0 10.0 


Figure A.6. Item Vector Plot for ELA Grade 8 (Within Grade) 


52 


PILOT ANALYSIS SUMMARY OF RESULTS 


+ 0 


| of 


7.5 


6.5 


Figure A./. Item Vector Plot for ELA Grade 9 (Within Grade) 


Oo 


10 05 1015 20 25 30 35 40 45 50 55 6.0 
MWY “AQ 
\ SQ 


, i. i a 
fe 05 1015 20 25 30 35 40 45 50 55 6.0 


A 


NO 
oO 
-2.0 S-1BQ i 


A 


Figure A.8. Item Vector Plot for ELA Grade 10 (Within Grade) 


sie: 


PILOT ANALYSIS SUMMARY OF RESULTS 


Figure A.9. Item Vector Plot for ELA Grade 11 (Within Grade) 


Oo 


0, 


Wo 05 10 15 20 25 30 35 40 45 50 


| 


{ 


-2.5 A, FMM 05 10 15 20 25 3.0 35 40 45 5.0 


BZ ZG 


Yj JA 


Figure A.10. Item Vector Plot for ELA Grades 3 and 4 (Across Grades) 


54 


PILOT ANALYSIS SUMMARY OF RESULTS 


@D 
nN 


15.0 17.0 


13.0 


0, 


| TTT TTT TTITTTTOTdTdTdTTdThdTTMTUMTCUdTT TCL 
-3.0 -1R : ’ 45 60 75 90 105 12.0 13.5 15.0 


mji0O5 20 35 50 65 80 9.5 11.0 


Figure A.11. Item Vector Plot for ELA Grades 4 and 5 (Across Grades) 


Oo 


10.0 11.5 13.0 


8.5 


01 


5 i a 
Fp 15 30 45 60 75 90 110 13.0 150 170 19.0 


Figure A.12. Item Vector Plot for ELA Grades 5 and 6 (Across Grades) 


35 


PILOT ANALYSIS SUMMARY OF RESULTS 


92 


12.0 1 
~\ 


10.5 


4.5 


0, 


PTtdsetrt.t.t?tsttt?tdsd io 
1.0 2.0 3.0 40 5.0 60 7.0 80 90 105 12.0 


Figure A.13. Item Vector Plot for ELA Grades 6 and 7 (Across Grades) 


Oo 


16.5 


12.5 


60 75 9.0 10.5 


15 30 45 


01 


0 1.0 2.0 30 40 50 60 7.0 80 90 105 120 13.5 15.0 


Figure A.14. Item Vector Plot for ELA Grades 7 and 8 (Across Grades) 


56 


PILOT ANALYSIS SUMMARY OF RESULTS 


S74 9 


0, 


= —<$<—<— > 


Ce 


aay, 051525 35 45 55 65 75 85 9.5 11.0 12.5 14.0 
i] 


(oo) 
a Ae 


Figure A.15. Item Vector Plot for ELA Grades 8 and 9 (Across Grades) 


$ ~\ YO 05 10 15 20 25 30 35 40 45 50 55 6.0 


Figure A.16. Item Vector Plot for ELA Grades 9 and 10 (Across Grades) 


S/ 


PILOT ANALYSIS SUMMARY OF RESULTS 


0 15 20 25 30 35 40 45 50 55 6.0 


01 


=e 


45 5.5 6.5 


NS 
< 


45> 


Figure A.17. Item Vector Plot for ELA Grades 10 and 11 (Across Grades) 


15 20 25 30 35 40 45 5.0 


o 
10 
(=) 0, 
= 

-2.5 “1.5 0 yy 05 10 15 20 25 30 35 40 45 5.0 


2.0 -PS 


Figure A.18. Item Vector Plots for the Subset of ELA Grade 3 and 4 Vertical Linking Items 


58 


PILOT ANALYSIS SUMMARY OF RESULTS 


Figure A.19. Item Vector Plots for the Subset of ELA Grade 4 and 5 Vertical Linking Items 


Oo 


8.5 10.0 11.5 13.0 


7.0 


0; 


frei rit i.e hho tL rier ted 
B70 15 30 45 60 75 90 11.0 13.0 15.0 17.0 19.0 


Figure A.20. Item Vector Plots for the Subset of ELA Grade 5 and 6 Vertical Linking Items 


59 


PILOT ANALYSIS SUMMARY OF RESULTS 


fen) 
nN 


(Se) 
oa 
i) 
o 
wD 
O 


-18 ig 10 20 30 40 50 60 70 80 9.0 10.0 


IN 
Lyre) | WN 
SS 


Figure A.21. Item Vector Plots for the Subset of ELA Grade 6 and 7 Vertical Linking Items 


Oo 


16.5 


14.5 


2.5 


1 


&0 1.0 2.0 30 40 50 60 70 80 90 105 120 135 15.0 


3.4 ~hO\OIO 15 30 45 60 75 9.0 10.5 


Figure A.22. Item Vector Plots for the Subset of ELA Grade 7 and 8 Vertical Linking Items 


60 


PILOT ANALYSIS SUMMARY OF RESULTS 


fen) 
nN 


9.0 10.0 


5.0 6.0 O 8.0 
a mee a (Es emcee ene Le 


7 


0, 
wy 

ar 7) 0.5 15 25 35 45 55 65 75 85 95 11.00 12.5 14.0 
AO 


N 
H 


Figure A.23. Item Vector Plots for the Subset of ELA Grade 8 and 9 Vertical Linking Items 


| 
jen) 
nN 


~ ere 
SS 


IV Ses 
\ \ \ 


= 


Ss 
SS 
NY 
SN 
\ 
WSs 
N 


V0 05 10 15 20 25 3.0 35 40 45 5.0 55 60 


a 


N 


I 
\ 
OA 


Figure A.24. Item Vector Plots for the Subset of ELA Grade 9 and 10 Vertical Linking Items 


61 


PILOT ANALYSIS SUMMARY OF RESULTS 


01 


005 10 15 20 25 30 35 40 45 50 55 6.0 


SS\\ 


OD 


Ns 


-1.5> 


Figure A.25. Item Vector Plots for the Subset of ELA Grade 10 and 11 Vertical Linking Items 


| y 4 
/, 


/ / fp iy f dl A 
Ly, 
Xy 


005 10 15 20 25 30 35 40 45 5.0 


re a a 5 


a 
Z 


fl 


Z 
0) 
g 


2 GG 
es | Uff 


Figure A.26. Item Vector Plot for Math Grade 3 (Within Grade) 


62 


PILOT ANALYSIS SUMMARY OF RESULTS 


— 0 


Nie 05 10 15 20 25 30 35 40 45 5.0 


0, 
Ef Ti rr tT TI 
35 25 BBA LG} 45 «55 
a a) 
Figure A.27. Item Vector Plot for Math Grade 4 (Within Grade) 
=) 2% 
: 7 
G 0, 
I l l | = ] ] T ] 
3.0 -2.0 Gh 4.0 5.0 6.0 


3.0 -25 -20 15 A0>03 
l 


Figure A.28. Item Vector Plot for Math Grade 5 (Within Grade) 


63 


PILOT ANALYSIS SUMMARY OF RESULTS 


3.0 


92 


1 


11.5 


0, 


yO 1.0 20 30 40 50 60 7.0 80 9.0 10.0 


\ 


SS 
im 


IN \() 
BA 


10 25 40 55 70 85 100 115 13.0 145 16.0 


Figure A.29. Item Vector Plot for Math Grade 6 (Within Grade) 


op 82 
ie a / JA 
oO — 
wo A 
Ww — 
2 
< / yi 7 
oD 
° Vi 
. | 
WA fy V 4A 
= AWM Ad /, a A 
i; My Y Vy) vy / D 
O° | KUM) A 
‘ f if} LY a 
© | WAY GA 7 
oO iy , Yy LG 
- / ly; ty GA 
SF 0, 


4.0 5.0 6.0 7.0 8.0 


’ f] 
2.0 e 
, I 
PD 


Figure A.30. Item Vector Plot for Math Grade 7 (Within Grade) 


64 


PILOT ANALYSIS SUMMARY OF RESULTS 


92 


8.5 10.5 


Figure A.31. Item Vector Plot for Math Grade 8 (Within Grade) 


6.0 7.0 80 9.0 10.0 11.0 12.0 
| | | | | | | | | | | | JJ 


5.0 


4.0 


10 20 3.0 


p10 


Oo 


-1.0 


Figure A.32. Item Vector Plot for Math Grade 9 (Within Grade) 


65 


PILOT ANALYSIS SUMMARY OF RESULTS 


0 15 20 25 30 35 40 45 50 55 60 65 7.0 


01 


1° 05 1 


it 
SNA 
\\ ea 


—_ 
o 
oO 
| 

oO 
o 
—_ 
oa 
N 
oa 
wo 
o 
& 
o 
o 
o 
Oo) 
oa 


Figure A.33. Item Vector Plot for Math Grade 10 (Within Grade) 


“ae 


Figure A.34. Item Vector Plot for Math Grade 11 (Within Grade) 


66 


PILOT ANALYSIS SUMMARY OF RESULTS 


| 
@D 
N 


15 20 25 30 35 40 45 5.0 


05 1.0 


Figure A.35. Item Vector Plot for Math Grades 3 and 4 (Across Grades) 


Oo 


15.0 17.0 


13.0 


0, 


05 20 35 50 65 80 95 11.0 


Ni 


-3.0 -1R 0 75 90 10.5 12.0 13.5 15.0 


Figure A.36. Item Vector Plot for Math Grades 4 and 5 (Across Grades) 


6/7 


PILOT ANALYSIS SUMMARY OF RESULTS 


) 
nN 


10.0 11.5 13.0 


01 


3.5 -Qpmo 15 30 45 60 7.5 9.0 19.0 


11.0 13.0 15.0 17.0 


/ 


Figure A.37. Item Vector Plot for Math Grades 5 and 6 (Across Grades) 


Oo 


12.0 1 
\ 


10.5 


o 
o 


0, 


1.0 20 30 40 50 60 70 80 90 105 12.0 


Figure A.38. Item Vector Plot for Math Grades 6 and 7 (Across Grades) 


68 


PILOT ANALYSIS SUMMARY OF RESULTS 


+ 0 


6.5 


| 05 15 25 35 45 55 65 75 85 95 11.0 12.5 


ey 
- 


Figure A.39. Item Vector Plot for Math Grades 7 and 8 (Across Grades) 


roo) 
So - 02 


0, 


zZ oer? ie tei et a. ee ea 
pos 15 25 35 45 55 65 75 85 95 11.0 12.5 14.0 


Figure A.40. Item Vector Plot for Math Grades 8 and 9 (Across Grades) 


69 


PILOT ANALYSIS SUMMARY OF RESULTS 


| 


S 


\ WVi0 05 10 15 20 25 30 35 40 45 50 55 6.0 


Figure A.41. Item Vector Plot for Math Grades 9 and 10 (Across Grades) 


= OQ 


8.5 


7.5 


Figure A.42. Item Vector Plot for Math Grades 10 and 11 (Across Grades) 


10 


PILOT ANALYSIS SUMMARY OF RESULTS 


56 10 15 20 25 30 35 4.0 


0, 


0 0 
i 


1 
SY 
vv 


005 10 15 20 25 30 35 40 45 5.0 


i) 
oa 
oa 
\ 
SS 
Is 


-1.5 


Figure A.43. Item Vector Plot for the Subset of Math Grade 3 and 4 Vertical Linking Items 


pee 
01 
oO 
‘A | 
2.0 es Of M0 05 10 15 20 25 30 35 40 45 5.0 
at 
3/4 
ol 
nm 


Figure A.44. Item Vector Plots for the Subset of Math Grade 4 and 5 Vertical Linking Items 


1 


3.0 4.0 5.0 6.0 7.0 8.0 


2.0 


LL tbs 
CO i= 


PILOT ANALYSIS SUMMARY OF RESULTS 


92 


Figure A.45. Item Vector Plots for the Subset of Math Grade 5 and 6 Vertical Linking Items 


15 SE 


/* 


N 
' 


60 05 10 15 20 25 30 35 40 45 5.0 


Figure A.46. Item Vector Plots for the Subset of Math Grade 6 and 7 Vertical Linking Items 


(2 


PILOT ANALYSIS SUMMARY OF RESULTS 


Figure A.47. Item Vector Plots for the Subset of Math Grade 7 and 8 Vertical Linking Items 


Oo 


110 125 14.0 


56 15 25 35 45 55 65 75 85 9.5 


A, 1° 


Figure A.48. Item Vector Plots for the Subset of Math Grade 8 and 9 Vertical Linking Items 


13 


PILOT ANALYSIS SUMMARY OF RESULTS 


Figure A.49. Item Vector Plots for the Subset of Math Grade 9 and 10 Vertical Linking Items 


(oo) 
o -) 2% 


| 
= 
SSS 
i WN \ SS = —= oe 
YASS SSS — 


————— 
A See 
Sa 


A 


Figure A.50. Item Vector Plots for the Subset of Math Grade 10 and 11 Vertical Linking Items 


14 


a Parameter Estimates 


a Parameter Estimates 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 


Assessment Consortium 


Appendix B: Tables and Figures IRT Model Comparison 


Figure B.1. Scatter Plot of ELA 2PL/GPC Slope and Difficulty Estimates by Item Type, Score Category 
and Claim 


ELA GO3 Scatter Plot of 2PL/GPC a and b 
by Item Type 


b Parameter Estimates 


ELA GO3 Scatter Plot of 2PL/GPC a and b 
by Claim 


@ Claim 1 


- 1.2 - 
ro) Claim 2 ) 
io 6 
&* Aclaim3  £ 
a . — 
2 0.8 < Claim 4 rt 
San San 
S 2 
w 0.6 rT) 
= = 
Tr rT 
wo 0.4 oo 
a. a. 
S S 


b Parameter Estimates 


ELA G04 Scatter Plot of 2PL/GPC a and b 


a Parameter Estimates 


ELA GO3 Scatter Plot of 2PL/GPC a and b 
by Score Category 


® 2 Categories 
@ 3 Categories 


A 4 Categories 


b Parameter Estimates 


ELA G04 Scatter Plot of 2PL/GPC a and b 
by Item Type 


b Parameter Estimates 


ELA G04 Scatter Plot of 2PL/GPC a and b 


by Score Category by Claim 
1.6 1.6 
@ 2 Categories 14 Sess 
1.4 4 @ 3 Categories 
- ff A 4Categories 1.2 ean? 
= . A Claim3 


b Parameter Estimates 


< Claim 4 


a Parameter Estimates 
(oe) 
CO 


b Parameter Estimates 


PILOT ANALYSIS SUMMARY OF RESULTS 


ELA GOS Scatter Plot of 2PL/GPC a and b ELA GO5 Scatter Plot of 2PL/GPC a and b 
by Item Type by Score Category 
o @ 2 Categories 
1.2 @ 3 Categories 
e a os A 4 Categories 


a Parameter Estimates 
a Parameter Estimates 


b Parameter Estimates b Parameter Estimates 
ELA GOS Scatter Plot of 2PL/GPC a and b ELA GO6 Scatter Plot of 2PL/GPC a and b 
by Claim by Item Type 
1.4 @Claim1 
. 12 m Claim 2 . 
y ‘ 5 x A Claim 3 gv 
= 1 AA x < Claim 4 f= 
i i 
@ oD 
rT) r7) 
€ = 
© © 
o o 
a. a. 
iS iS 
b Parameter Estimates b Parameter Estimates 
ELA GOG Scatter Plot of 2PL/GPC a and b ELA G06 Scatter Plot of 2PL/GPC a and b 
by Score Category by Claim 
25 2.5 
® 2 Categories @Claim1 
ae r @ 3 Categories v a) | Sine 
£ 2 A4Categories 
= £ CI A Claim 3 
= 31.5 7 < Claim 4 
ra - m ce Sf 
co) LQ xX x. 
@ V4 oo m Xx 
E 5 Ph Re 
o ra x * 
. re} 0.5 ¢ > x 
a ® x a xt 
0 
-4 -2 0 2 4 6 8 10 -4 -2 0 2 4 6 8 10 


b Parameter Estimates b Parameter Estimates 


PILOT ANALYSIS SUMMARY OF RESULTS 


ELA GO7 Scatter Plot of 2PL/GPC a and b ELA GO7 Scatter Plot of 2PL/GPC a and b 
by Item Type by Score Category 
1.4 1.4 
@ 2 Categories 
yn i2 ie ee rN ™ 3 Categories 
2 he + A 4 Categories 
© 1 os %~ 8 
£ E 
i 0.8 2 
9 7 
@ 0.6 i) 
= = 
. - 
& 0.4 & 
sa : 
0 
b Parameter Estimates b Parameter Estimates 
ELA GO7 Scatter Plot of 2PL/GPC a and b ELA GO8 Scatter Plot of 2PL/GPC a and b 
by Claim by Item Type 
1.4 
12 @Claim1 
g Claim 2 g 
a AClim3 § 
rm rr 
@ 0.6 rm) 
= = 
a) a) 
= 0.4 os 
a. a. 
Oo 7 
0 
-4 -2 0 2 4 6 8 
b Parameter Estimates b Parameter Estimates 
ELA GO8 Scatter Plot of 2PL/GPC a and b ELA G08 Scatter Plot of 2PL/GPC a and b 
by Score Category by Claim 
1.8 1.8 
® 2 Categories : 
= 1.6 a @ 3 Categories v me : Ses 
5 1.4 A 4Categories os 1.4 
= £12 A Claim3 
v7 ul 1 < Claim 4 
2 2 0.8 
e E 
5 © 0.6 
a a 04 
S S 
0.2 
0 


-4 -2 0 2 4 6 8 
b Parameter Estimates b Parameter Estimates 


ELA GO9 Scatter Plot of 2PL/GPC a and b 


b Parameter Estimates 


by Item Type 
1.4 
1.2 
— 
e 
e? 
7 
wi 0.8 
g 
wo 0.6 
= 
a) 
= 0.4 
a. 
5 0.2 
0 
b Parameter Estimates 
ELA GO9 Scatter Plot of 2PL/GPC a and b 
by Claim 
1.4 
12 ; * @ Claim 1 
— 
@ x ™ Claim 2 
6 © % e 
= 1 A ¢ 7 @ x A Claim 3 
¥ @ oe? a on Y Cree 
wi 0.8 > a SOK oy X _ 
he 
wo 0.6 
& 
© 
oo 0.4 
a. 
79 
0 
b Parameter Estimates 
ELA G10 Scatter Plot of 2PL/GPC a and b 
by Score Category 
1.4 
m Ag ® 2 Categories 
a 12 m@ 3 Categories 
o 
2 ©” ¢@ oO A 4 Categories 
© oe %@ Oo 
£ ¢ 
7 
Lid 
g 
cod) 
& 
= 
© 
a. 
d 


a Parameter Estimates 


PILOT ANALYSIS SUMMARY OF RESULTS 


ELA GO9 Scatter Plot of 2PL/GPC a and b 


by Score Category 
1.4 
® 2 Categories 
1.2 Py o ™ 3 Categories 
o eid A 4 Categories 


b Parameter Estimates 


ELA G10 Scatter Plot of 2PL/GPC a and b 


by Item Type 

1.4 

1.2 
Wn 
co) 
@ 4 
£ 
7 
wi 0.8 
2 
@ 0.6 
= 
ca) 
wo 0.4 
A. 
=o 

0 
b Parameter Estimates 
ELA G10 Scatter Plot of 2PL/GPC a and b 
by Claim 
1.4 
os 8, 

, 12 @Claim1 
g aA ‘ “ x @ Claim 2 
: : on as 7 a A Claim 3 
= o 
a 0.8 A fut ‘s x Claim 4 
2 
@ 0.6 
= 
ca) 
wo 0.4 
A. 
fa} 


= 
N 


b Parameter Estimates 


ELA G11 Scatter Plot of 2PL/GPC a and b 
by Item Type 


a Parameter Estimates 


b Parameter Estimates 


ELA G11 Scatter Plot of 2PL/GPC a and b 


by Claim 
1.4 
| 

, 12 ‘ — @ Claim1 
cD) i @ Claim 2 
6 
E 1 A Claim3 
San 
= 
wo 0.6 
= 
= 
o 0.4 
a. 
S 

0.2 


b Parameter Estimates 


19 


a Parameter Estimates 


PILOT ANALYSIS SUMMARY OF RESULTS 


ELA G11 Scatter Plot of 2PL/GPC a and b 


by Score Category 
1.4 
4 ® 2 Categories 
1.2 o a? m 3 Categories 
| 


A 4 Categories 


b Parameter Estimates 


a Parameter Estimates 


a Parameter Estimates 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 


Figure B.2. Scatter Plot of Math 2PL/GPC Slope and Difficulty Estimates by Item Type, Score 
Category and Claim 


a Parameter Estimates 


MATH GO3 Scatter Plot of 2PL/GPC a and b MATH GO3 Scatter Plot of 2PL/GPC a and b 
by Item Type by Score Category 
1.4 1.4 
© 2 Categories 
it Pepe PP z @ 3 Categories 
1 : 1 4 4 Categories 
0.8 w 0.8 
ry 
0.6 ® 0.6 
E 
fas] 
0.4 = 0.4 
a 
0.2 5 0.2 
0 T ] T T l 0 
-4 oF 0 2 4 6 -4 -2 0 2 4 6 
b Parameter Estimates b Parameter Estimates 
MATH GO3 Scatter Plot of 2PL/GPC a and b MATH G04 Scatter Plot of 2PL/GPC a and b 
by Claim by Item Type 
1.4 1.4 
12 @Claim1 412 —_ 
@ Claim 2 7) 
1 & m MC 
A Claim 3 = 
0.8 xClim4 Bog 
o 
0.6 @ 0.6 
= o 
© 
0.4 a Sy 0.4 . cs 
0.2 ° 0.2 ia] 
0 T | l l ] 0 l T T T ] 
-4 2 0 2 4 6 -4 -2 0 2 4 5 
b Parameter Estimates b Parameter Estimates 
MATH G04 Scatter Plot of 2PL/GPC a and b MATH G04 Scatter Plot of 2PL/GPC a and b 
by Score Category by Claim 
1.4 1.4 5 
® 2 Categories iis 
Le @ 3 Categories - 1.2 - 
4 4 Categories = — 
1 ¢ i A Claim 3 
0.8 ky 0.8 - = Claim 4 
g 
0.6 w 0.6 
: E ; 
0.4 e ~o 0.4 - A 
+ = * 
0.2 ° S02 - 2 
0 I 1 0 -+ T 
-4 0 2 4 6 -4 -2 0 2 4 6 
b Parameter Estimates b Parameter Estimates 


80 


a Parameter Estimates a Parameter Estimates 


a Parameter Estimates 


Smarter 


Assessment Consortium 


MATH GOS Scatter Plot of 2PL/GPC a and b 
by Item Type 


b Parameter Estimates 


MATH GOS Scatter Plot of 2PL/GPC a and b 


by Claim 
. @Claim1 
? a @ Claim 2 
* .) 
"°* xf &€ A Claim 3 
290", %, 
 % ° Claim 4 


b Parameter Estimates 


MATH GO6 Scatter Plot of 2PL/GPC a and b 


by Score Category 
® 2 Categories 
@ 3 Categories 
4 4 Categories 
* 
= + 
e 
| | T | 
-4 -2 0 2 4 6 


b Parameter Estimates 


PILOT ANALYSIS SUMMARY OF RESULTS 


MATH GOS Scatter Plot of 2PL/GPC a and b 


by Score Category 
1.6 
@ 2 Categories 
oa 1.4 @ 3 Categories 
= 1.2 4 4 Categories 
A= 
= 
uu 
@ 0.8 
_ 
Yv 
£ 0.6 - 
= * 
& 04 
& * ss 
0.2 
0 l T T T 1 
-4 -2 0 2 4 6 


b Parameter Estimates 


MATH GO6 Scatter Plot of 2PL/GPC a and b 
by Item Type 


a Parameter Estimates 


b Parameter Estimates 


MATH GO6 Scatter Plot of 2PL/GPC a and b 


by Claim 

1.8 

1.6 o | @Claim1 
wv x . 
21.4 ¢ ey @Claim2 
& 4 - A Claim 3 
hs 1 @ re «Claim 4 
ho 
2 
: 0.8 a 

,* 

c 0.6 , @ 
© 
a 0.4 
8 

0.2 

0 


-4 -2 0 Z 4 6 
b Parameter Estimates 


81 


a Parameter Estimates a Parameter Estimates 


a Parameter Estimates 


Smarter 


MATH GO7 Scatter Plot of 2PL/GPC a and b 
by Item Type 


a Parameter Estimates 


b Parameter Estimates 


MATH GO7 Scatter Plot of 2PL/GPC a and b 
by Claim 


@Claim1 
@ Claim 2 
A Claim 3 
« Claim 4 


a Parameter Estimates 


b Parameter Estimates 


MATH GO8 Scatter Plot of 2PL/GPC a and b 


by Score Category 
; % @2Catego = 

* 3 Categories - 
4 4 Categories = 
£ 
i) 
—_— 
= 
vy 
~~ 
v 
£ 
¢ = 
*, * « 
2 a S 

T T T T 1 

4 os , 0 2 4 6 


b Parameter Estimates 


82 


PILOT ANALYSIS SUMMARY OF RESULTS 


MATH GO7 Scatter Plot of 2PL/GPC a and b 
by Score Category 
® 2 Categories 


@ 3 Categories 
4 4 Categories 


-2 0 2 4 6 8 
b Parameter Estimates 


MATH GO8 Scatter Plot of 2PL/GPC a and b 
by Item Type 


b Parameter Estimates 


MATH GO8 Scatter Plot of 2PL/GPC a and b 


by Claim 
1.6 
1.4 e * @Claimi 
w= Clai 
12 Claim 2 
A Claim 3 
7 «Claim 4 
0.8 
0.6 
0.4 
0.2 


b Parameter Estimates 


a Parameter Estimates a Parameter Estimates 


a Parameter Estimates 


Smarter 


MATH GO9 Scatter Plot of 2PL/GPC a and b 


by Item Type 


b Parameter Estimates 


MATH GO9 Scatter Plot of 2PL/GPC a and b 


a Parameter Estimates 
eS 
0 


by Claim 
+ ‘ @Claim1 
° ™Claim2 
<a 
e ¢ A Claim 3 
. ° iy «Claim4 
Lf of 2 2 
2° 83s, oe 
of? ¢ <> ee 
a% | Ps v 
° « 
+ % S P A e 2 . 
-4 -2 0 2 4 6 
b Parameter Estimates 
MATH G10 Scatter Plot of 2PL/GPC a and b 
by Score Category 
® 2 Categories 
‘ %~e @ 3 Categories 
os 4 4 Categories 
» ¢ ¢ 
¢ ¢ 
e+ G+ ¢ 
* | ." ° 
a an € 
"ag F a? 
ia 5 
oa e A ¢ 
a4 Ne " + 
-4 -2 0 2 4 


b Parameter Estimates 


a Parameter Estimates 
bh 


a Parameter Estimates 


83 


PILOT ANALYSIS SUMMARY OF RESULTS 


MATH GO9 Scatter Plot of 2PL/GPC a and b 


by Score Category 
| ® 2 Categories 
7 @ 3 Categories 
a 4 4 Categories 
j 
| a 
4 = 0 2 4 6 
b Parameter Estimates 
MATH G10 Scatter Plot of 2PL/GPC a and b 
by Item Type 
° *« CR 
%~e m Mc 
° ¢ ¢ 
* ¢ 
* 
% . 7? 
+ 
e ¥ +* 
oo* o * 
| 452 + a 
% :* * 
* Lv ¢ 
= of 2 » ¢ 
- o¢ 3 . *om ™ ia 
-4 -2 0 2 4 
b Parameter Estimates 
MATH G10 Scatter Plot of 2PL/GPC a and b 
by Claim 
o, @Claim1 
. m Claim 2 
* 
* ¢ ¢ A Claim 3 
e ¢ 
, oa « Claim 4 
. aa", 
a » e 
¢ 
t 
; #t. vga 
“a? o 3 se xg Pi 
oe 2 bad e 
-4 -2 0 2 4 
b Parameter Estimates 


a Parameter Estimates 


a Parameter Estimates 


PILOT ANALYSIS SUMMARY OF RESULTS 


MATH G11 Scatter Plot of 2PL/GPC a and b MATH G11 Scatter Plot of 2PL/GPC a and b 
by Item Type by Score Category 
25 
® 2 Categories 
“ * © e @ 3 Categories 
= 2 4 4 Categories 
£ 
% 1.5 
7) * 
= 1 ot * 
- rh 
me so0ge% ‘ 
ans, nee 
0 T T T T 
4 =) 0 2 4 6 
b Parameter Estimates b Parameter Estimates 


MATH G11 Scatter Plot of 2PL/GPC a and b 


by Claim 

ye 

= e @Claim1 

2 . Claim 2 

A Claim 3 
a 
1 
0.5 


b Parameter Estimates 


84 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 


Figure B.3. ELA Scatter Plots of Theta Estimates across Different Model Combinations 


Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
ELA GO3 (1PL/PC vs 2PL GPC) ELA G03 (1PL/PC vs 3PL GPC} 
6 - § 
4 4 4 
ba] 
®2- F 2 
AE eras 
= 
s ie = 0 
a a 
aaa 2 en 
‘ . aS 4 
6 + l T T =o 
x -4 =2 0 2 4 6 
1PL/PC Theta 1PL/P€ Theta 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
ELA GO3 (2PL/GPC vs 3PL GPC) ELA G04 (1PL/PC vs 2PL GPC) 
6 - 6 - _ 


3PL/GPC Theta 
ro) 

2PL/GPC Theta 
° 


A A 4 
6 : ( —T E: 7 6 | | | | | | 
6 4 3 0 2 4 6 -6 4 2 0 2 4 6 
2PL/GPC Theta 1PL/PC Theta 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
ELA G04 (1PL/PC vs 3PL GPC) ELA G04 (2PL/GPC vs 3PL GPC) 
6 5 . G = 
4 4 4 


3PL/GPC Theta 
=) 

3PL/GPC Theta 
oO 


2 + 2 
oe Si a -4 
6 “T T T ] 6 


1PL/PC Theta 2PL/GPC Theta 


Smarter 


Assessment Consortium 


Scatter Plot of Theta Estimates 
ELA GOS (1PL/PC vs 2PL GPC) 


2PL/GPC Theta 
=) 


+ 


-6 -4 -2 0 2 
1PL/PC Theta 


Scatter Plot of Theta Estimates 
ELA G05 (2PL/GPC vs 3PL GPC) 


3PL/GPC Theta 
i) 


2 
-4 
-6 
6 4 2 0 2 4 
2PL/GPC Theta 


Scatter Plot of Theta Estimates 
ELA GO6 (1PL/PC vs 3PL GPC) 


3PL/GPC Theta 
=) 


3. eee: 


1PL/PC Theta 


3PL/GPC Theta 


2PL/GPC Theta 


3PL/GPC Theta 


PILOT ANALYSIS SUMMARY OF RESULTS 


Scatter Plot of Theta Estimates 
ELA GOS (1PL/PC vs 3PL GPC) 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
ELA G06 (1PL/PC vs 2PL GPC) 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
ELA G06 (2PL/GPC vs 3PL GPC) 


2PL/GPC Theta 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
ELA GO7 (1PL/PC vs 2PL GPC) ELA GO7 (1PL/PC vs 3PL GPC) 
6 - 6 
4 ma 4 2 


2PL/GPC Theta 
=) 

3PL/GPC Theta 
oO 


cs 2 
a 4 ¥ 
6 T T T T T ] 6 T : | T T | 
6 -4 2 0 2 4 6 -6 4 2 0 2 4 6 
1PL/PC Theta 1PL/PC Theta 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
ELA GO7 (2PL/GPC vs 3PL GPC) ELA GO8 (1PL/PC vs 2PL GPC) 
6 - 5 = 
4 A = 


3PL/GPC Theta 
oO 

2PL/GPC Theta 
oO 


-4 BJ 
-6 -6 za T ] | “] | 1 
-6 -4 “2 0 2 4 6 -6 -4 «2 0 2 4 6 
2PL/GPC Theta 1PL/PC Theta 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
ELA G08 (1PL/PC vs 3PL GPC) ELA GO8 (2PL/GPC vs 3PL GPC) 
6 - Bi 


3PL/GPC Theta 
ro) 

3PL/GPC Theta 
° 


1PL/PC Theta 2PL/GPC Theta 


Smarter 


Assessment Consortium 


Scatter Plot of Theta Estimates 
ELA GO9 (1PL/PC vs 2PL GPC) 


2PL/GPC Theta 
=) 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
ELA GO9 (2PL/GPC vs 3PL GPC) 


3PL/GPC Theta 
=) 


2PL/GPC Theta 


Scatter Plot of Theta Estimates 
ELA G10 (1PL/PC vs 3PL GPC) 


3PL/GPC Theta 
Oo 


1PL/PC Theta 


3PL/GPC Theta 


2PL/GPC Theta 


3PL/GPC Theta 


PILOT ANALYSIS SUMMARY OF RESULTS 


Scatter Plot of Theta Estimates 
ELA GO9 (1PL/PC vs 3PL GPC) 


= -4 =2 0 2 4 6 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
ELA G10 (1PL/PC vs 2PL GPC) 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
ELA G10 (2PL/GPC vs 3PL GPC) 


2PL/GPC Theta 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
ELA G11 (1PL/PC vs 2PL GPC) ELA G11 (1PL/PC vs 3PL GPC) 
6 - Ga 
4 - 4 - 


2PL/GPC Theta 
=) 

3PL/GPC Theta 
=) 


1PL/PC Theta 1PL/PC Theta 


Scatter Plot of Theta Estimates 
ELA G11 (2PL/GPC vs 3PL GPC) 


3PL/GPC Theta 
=) 


2PL/GPC Theta 


89 


Smarter 


Assessment Consortium 


PILOT ANALYSIS SUMMARY OF RESULTS 


Figure B.4. Math Scatter Plots of Theta Estimates across Different Model Combinations 


Pad 


2PL/GPC Theta 
= 


3PL/GPC Theta 


3PL/GPC Theta 


Scatter Plot of Theta Estimates 
Math G03 (1PL/PC vs 2PL GPC) 


iy 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math GO3 (2PL/GPC vs 3PL GPC) 


2PL/GPC Theta 


Scatter Plot of Theta Estimates 
Math G04 (1PL/PC vs 3PL GPC) 


1PL/PC Theta 


ise? 4 
se: 
r yt 


3PL/GPC Theta 


2PL/GPC Theta 


3PL/GPC Theta 


Scatter Plot of Theta Estimates 
Math GO3 (1PL/PC vs 3PL GPC) 


‘i 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math G04 (1PL/PC vs 2PL GPC) 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math G04 (2PL/GPC vs 3PL GPC) 


2PL/GPC Theta 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 


Math GOS (1PL/PC vs 2PL GPC) Math GOS (1PL/PC vs 3PL GPC) 


ats 
_ i; . 
a ¥ ; 


fa La" 
m2 rT 
E E 
u U 
a 0 a 
c O 
=> ay 
& 2 S 
-4 
-6 
-6 -4 a 0 2 4 6 
1PL/PC Theta 1PL/PC Theta 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
Math GOS (2PL/GPC vs 3PL GPC) Math G06 (1PL/PC vs 2PL GPC) 


3PL/GPC Theta 
2PL/GPC Theta 


2PL/GPC Theta 1PL/PC Theta 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
Math G06 (1PL/PC vs 3PL GPC) Math G06 (2PL/GPC vs 3PL GPC) 


3PL/GPC Theta 
3PL/GPC Theta 


1PL/PC Theta 2PL/GPC Theta 


Smarter 


Assessment Consortium 


Scatter Plot of Theta Estimates 
Math G07 (1PL/PC vs 2PL GPC) 


2PL/GPC Theta 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math G07 (2PL/GPC vs 3PL GPC) 


3PL/GPC Theta 


2PL/GPC Theta 


Scatter Plot of Theta Estimates 
Math G08 (1PL/PC vs 3PL GPC) 


3PL/GPC Theta 
=| 


1PL/PC Theta 


PILOT ANALYSIS SUMMARY OF RESULTS 


Scatter Plot of Theta Estimates 
Math G07 (1PL/PC vs 3PL GPC) 


3PL/GPC Theta 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math G08 (1PL/PC vs 2PL GPC) 


2PL/GPC Theta 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math G08 (2PL/GPC vs 3PL GPC) 


awe 


4 


3PL/GPC Theta 


2PL/GPC Theta 


Smarter 


Assessment Consortium 


Scatter Plot of Theta Estimates 
Math G09 (1PL/PC vs 2PL GPC) 


2PL/GPC Theta 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math GO9 (2PL/GPC vs 3PL GPC) 


3PL/GPC Theta 


2PL/GPC Theta 


Scatter Plot of Theta Estimates 
Math G10 (1PL/PC vs 3PL GPC) 


3PL/GPC Theta 


1PL/PC Theta 


PILOT ANALYSIS SUMMARY OF RESULTS 


Scatter Plot of Theta Estimates 
Math G09 (1PL/PC vs 3PL GPC) 


3PL/GPC Theta 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math G10 (1PL/PC vs 2PL GPC) 


2PL/GPC Theta 


1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math G10 (2PL/GPC vs 3PL GPC) 


3PL/GPC Theta 


2PL/GPC Theta 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 


Assessment Consortium 
Scatter Plot of Theta Estimates Scatter Plot of Theta Estimates 
Math G11 (1PL/PC vs 2PL GPC) Math G11 (1PL/PC vs 3PL GPC) 


2PL/GPC Theta 
3PL/GPC Theta 


1PL/PC Theta 1PL/PC Theta 


Scatter Plot of Theta Estimates 
Math G11 (2PL/GPC vs 3PL GPC) 


3PL/GPC Theta 


2PL/GPC Theta 


94 


Smarter PILOT ANALYSIS SUMMARY OF RESULTS 
Assessment Consortium 


Table B.1. ELA Items Receiving Pre-treatment before Calibration based on Data Clearning Procedure 


Admin Item Item CAT/ 


Grade Grade Number PT ol Pre-Treatment 


3 3 52402 CAT 1 Collapsed categories: 0,1,2 becomes O,0,1 due to nonmonotonic 
responses 

3 3 52411 CAT 1 Collapsed categories: 0,1,2 becomes O,0,1 due to nonmonotonic 
responses 

3 3 537/79 CAT 2 _ Dropped as suggested by content review 

3 3 53801 CAT 2 _ Dropped due to low item-total correlation 

3 3 53925 CAT 3. Dropped as suggested by content review 

3 3 54099 CAT 1. Dropped due to low item-total correlation 

3 S 54163 CAT 1 _ Dropped due to low item-total correlation 

S 3 54219 CAT 1. Dropped due to low item-total correlation 

3 3 54223 CAT 4 _ Dropped due to low item-total correlation 

3 3 54253 CAT 2 _ Dropped due to low item-total correlation 

3 3 54303 CAT 1. Dropped due to low item-total correlation 

3 3 54319 CAT 3 _ Dropped due to low item-total correlation 

3 3 56130A PI 2 _ Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 

3 3 56130B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

3 3 56133 PI 4 Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 

3 3 56134A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,2 due to sparse 
responses 

3 3 56134B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,2 due to sparse 
responses 

3 3 56194A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

3 3 56194B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

3 3 56199A PI 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

3 3 56199B PI 2 _ Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 

3 3 56325A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,2 due to sparse 
responses 

3 3 56325B PI 2 _ Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 
responses 

3 4 54616 CAT 4 _ Dropped due to low item-total correlation 

3 4 54982 CAT 3 _. Dropped as suggested by content review 

3 4 56186 PI 4 _ Dropped due to no scored responses 

3 4 5618/7 = PT 4 Dropped due to no scored responses 

S 4 56188A PT 2 Dropped due to no scored responses 

3 4 56188B PT 2 Dropped due to no scored responses 

3 4 56188C PT 2 Dropped due to no scored responses 

3 4 56244 _ =PT 4 Dropped due to no scored responses 


95 


ANolaalial 


iksyaal 


iksvaal 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Grade 
3 


BIWWWWWWWWWWWWWWWWWWWWWWWWWW W 


RBH HPHPPPPHPPH HSA 


Ci gs\e (sme LU lanl eleva 


iN 


OR HPHHPHPPHPPHPPPHPPPPPPHPPHPHPHHP PPP SF 


WWWWWWWWWWWW WwW Ww 


56245 
56247 
56248A 
56248B 
56248C 
56258 
56259 
56261 
56263A 
56263B 
56263C 
56299 
56302 
56309A 
56309B 
56309C 
56311 
56312 
56313A 
56313B 
56313C 
56461 
56462 
56463 
56464A 
56464B 
56464C 
56468 
52402 


53925 
54099 
54253 
54303 
56126 
56128 
56130A 
56130B 
56130C 
56133 
56134A 
56134B 
56134C 
56189 


CAT 


CAT 
CAT 
CAT 
CAT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 


iN 


PIB BPP HRPPPANNNAANNNAFAPNNNAPPAPANNNB SF 


BFHONNANNN APRN KF W 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Collapsed categories: 0,1,2 becomes O0,0,1 due to nonmonotonic 
responses 

Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 


96 


ANolaalial 


iksyaal 


iksvaal 


Grade Number 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Grade 
4 


BRHHHPHPPHPPPHPPPPHPHPHPHPPHPPHPPPPPPPHPPHPHPHPHPHPHP HP SFA 


RH HHPHHPHPPPPPPPPPPHPPPHPPSPwwwvowwvowwwvowwowwo wo Ww Ww W 


56192 
56194A 
56194B 
56194C 

56197 

56198 
56199A 
56199B 
56199C 

56324 
56325A 
56325B 
56325C 

56390 

56410 

56411 

56467 

54490 

54500 

54540 

54568 

54580 

54588 

54616 

54634 

54982 

55023 

55025 

55027 

55350 

55368 

55444 

55667 

55688 

55/38 

55/42 
56188A 


56188B 


56248A 


56248B 


PT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
PT 


PT 


PT 


PT 


aS 


MO WWwWWWrR RRP WWWWHANNNNNNIFPFPPAPNNNNNNN FPBNN YN 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 

Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 

Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 

Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 


97 


ANolaalial 


Grade 


4 
4 
A 
4 
A 
A 
A 
4 
4 
4 
A 
4 
4 
4 
4 
A 
4 
4 
4 
4 
4 
A 
4 
A 
4 
A 
4 


iksyaal 


iksvaal 


Ci gs\e(-mem\ LU lanl eleva 


ona»o»o»ao»o»o»wrnrnkd oa oo oo ooo oo oo oO Oo Ooo ol Ol 


56263A 


56263B 


56309A 


56309B 


56313A 


56313B 


56462 
56464A 


56464B 


54674 
54676 
55099 
55105 
55109 
55110A 
55110B 
55110C 
55542 
55544 
55945 
5554 7A 
55547B 
55547C 
56191 
56193 
56195 
56196A 
56196B 
56196C 
56271 
5627/2 
56273 
56274A 
56274B 
56274C 
56320 


CAT/ 
PT 


CAT 
CAT 


Claim 


BPHONNAFPPANNNAFPAPNONNFPPANNN PPHPR KH 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
Collapsed categories: 
responses 
Collapsed categories: 
responses 


Dropped due to low item-total correlation 
Dropped as suggested by content review 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 


98 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2 becomes O,1,1 due to sparse responses 
0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


ANolaalial 


iksyaal 


iksvaal 


CAT/ 
PT 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Grade 
4 


oo onon1n»en»o»oa#&4o»Iia»o»oaoo»aooo»#ooaoo»ooo»oo»aoo»aooo»aoo»o»on#ono»o»owoowni’didk’ieouooaonoddsao o,f B fH A 


Grade Number 


oy; RRR HHPHPHPHPPPPPHPPPHPPPPPPPPPPPPPPPPPPPPPPP Haan ua oi 


56321 
56322A 
56322B 
56322C 

56469 

54490 

54568 

55667 

55/38 

56186 

56187 
56188A 
56188B 
56188C 

56244 

56245 

56247 
56248A 
56248B 
56248C 

56258 

56259 

56261 
56263A 
56263B 
56263C 

56299 

56302 
56309A 
56309B 
56309C 

56311 

56312 
56313A 
56313B 
56313C 

56461 

56462 

56463 
56464A 
56464B 
56464C 

56468 

52315 


CAT 
CAT 
CAT 
CAT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
CAT 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 


99 


ANolaalial 


Grade 
5 


oo oOo oO O1 O1 O1 Ol 


Oo1 O1 O1 OF O1 O1 O1 O1 O1 O1 O1 O1 O1 O1 O1 Ol 


iksyaal 


iksvaal 


Grade Number 


oo o1 oO O1 O1 O1 Ol 


NANA AAAAAANAANAAANHAANA DW OW YO 


53652 
53663 
54674 
54676 
54764 
54858 
54922 
54940 
55110A 


55110B 


5S547A 


55547B 


56196A 


56196B 


56274A 


56274B 


56322A 


56322B 


5057/7 
52257 
52268 
52269 
52390 
52398A 
52398B 
52398C 
52645 
52647 
52689 
52801 
52849 
52855 
53021 
53022 


CAT 


PT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
PT 
PT 


MW BWNHRRRE BRB 


BHBWWNHRRRNNN AF FHP PBN 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 


Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 


Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to no scored responses 


100 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


ANolaalial 


Grade 


5 6 53023 PI 4 _ Dropped due to no scored responses 

5 6 53024A PT 2 Dropped due to no scored responses 

5 6 53024B PT 2 Dropped due to no scored responses 

5 6 53024C PT 2 Dropped due to no scored responses 

5 6 55085 PI 4 _ Dropped due to no scored responses 

5 6 55086 PI 4 _ Dropped due to no scored responses 

5 6 55088A PT 2 Dropped due to no scored responses 

5 6 55088B_ PT 2 Dropped due to no scored responses 

5 6 55088C PT 2 Dropped due to no scored responses 

5 6 55089 PI  4_. Dropped due to no scored responses 

5 6 55090 PI 4 _ Dropped due to no scored responses 

5 6 55092 PI 4 _ Dropped due to no scored responses 

5 6 55093 PI 4 _. Dropped due to no scored responses 

5 6 55094A_ PT 2 Dropped due to no scored responses 

5 6 55094B- OPT 2 Dropped due to no scored responses 

5 6 55094C OPT 2 Dropped due to no scored responses 

5 6 55095 PI 4 _. Dropped due to no scored responses 

5 6 55098 PI 4_. Dropped due to no scored responses 

5 6 55103A_ PT 2 Dropped due to no scored responses 

5 6 55103B- PT 2 Dropped due to no scored responses 

5 6 55103C PT 2 Dropped due to no scored responses 

5 6 55631 PI 4. Dropped due to no scored responses 

5 6 55922 PT 4 _ Dropped due to no scored responses 

5 6 55923 PI 4 _ Dropped due to no scored responses 

5 6 55925 PT 4 _ Dropped due to no scored responses 

5 6 55926 PI 4 _ Dropped due to no scored responses 

5 6 55927A PI 4 _ Dropped due to no scored responses 

5 6 55927B PI 4 _ Dropped due to no scored responses 

5 6 55927C PI 4 _ Dropped due to no scored responses 

5 6 56012 PI 4 _. Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
5 6 56121 PI 4 Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
6 5 52315 CAT 3. Dropped due to low item-total correlation 
6 5 53652 CAT 1 _ Dropped as suggested by content review 
6 5 53663 CAT 1. Dropped due to low item-total correlation 
6 5 55099 PI 4 _ Dropped due to no scored responses 

6 5 55105 PI 4 _ Dropped due to no scored responses 

6 5 55109 PI 4 _ Dropped due to no scored responses 

6 5 55110A_ PT 2 Dropped due to no scored responses 

6 5 55110B PT 2 Dropped due to no scored responses 

6 5 55110C PT 2 Dropped due to no scored responses 

6 5 55542 PT 4 _ Dropped due to no scored responses 

6 5 55544 PT 4 _ Dropped due to no scored responses 

6 5 55545 PI 4 _ Dropped due to no scored responses 

6 5 55547A PT 2 Dropped due to no scored responses 


iksyaal 


iksvaal 


Grade Number 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


101 


ANolaalial 


CT gs\els 
6 


DD ANAA AA ADAAANAAANHA AA ANAAANHANAAN A ® O 


Ooo © Oo 


Onn ® Oo 


oO) 


NOONAN AD DOD Oo 


iksyaal 


iksvaal 


Grade Number 


orn °»°»°n0»°niondioandrioeondrndnnana#§a»cn oo ool ol Ol 


Ooo © O 


Ono © Oo 


Oo) 


(o> ©> am ©> a ©? a © a ©) a © ©) 


55547B 
55547C 
56191 
56193 
56195 
56196A 
56196B 
56196C 
56252 
56271 
5627/2 
56273 
56274A 
56274B 
56274C 
56320 
56321 
56322A 
56322B 
56322C 
56469 
46544 


A7824 
A7844 
48230 
43333 


48350 
48701 
48799 
48801 
52398A 


52398B 


52645 
52647 
52673 
52689 
52707 
52/712 
52716 
52718 


CAT/ 
PT 


CAT 


CAT 
CAT 
CAT 
CAT 


CAT 
CAT 
CAT 
CAT 
PT 


PT 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 


Claim 


NO 


WIR HNONNAANHNNAFPFPPPNNN PPB N 


NO MO WWW Pe PB RPP 


PRPRRRPRRP RB 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Collapsed categories: 0,1,2 becomes O0,0,1 due to nonmonotonic 
responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Collapsed categories: 0,1,2 becomes O0,0,1 due to nonmonotonic 
responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,2 due to sparse 
responses 

Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 
responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 


102 


ANolaalial 


CT gs\els 


iksyaal 


iksvaal 


Grade Number 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


6 6 52766 CAT 2 _ Dropped due to low item-total correlation 

6 6 52783 CAT 2 _ Dropped due to low item-total correlation 

6 6 52791 CAT 2 _ Dropped due to low item-total correlation 

6 6 52801 CAT 2 _ Dropped as suggested by content review 

6 6 52825 CAT 2 _ Dropped due to low item-total correlation 

6 6 52849 CAT 3. Dropped as suggested by content review 

6 6 52855 CAT 3. Dropped as suggested by content review 

6 6 52859 CAT 3. Dropped as suggested by content review 

6 6 52873 CAT 3. Dropped due to low item-total correlation 

6 6 52895 CAT 4 _ Dropped due to low item-total correlation 

6 6 53024A PI 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

6 6 53024B PI 2 Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 

6 6 55088A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,2 due to sparse 
responses 

6 6 55088B PI 2 _ Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 
responses 

6 6 55094A_ =~PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,2 due to sparse 
responses 

6 6 55094B- PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

6 6 55103A PI 2. Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

6 6 55103B- PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

6 6 55927A PT 4 _ Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 
responses 

6 6 55927B PI 4 _. Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 
responses 

6 6 56121 PI 4 _. Collapsed categories: 0,1,2 becomes O,1,1 due to Sparse responses 

6 f 46442 CAT 3. Dropped as suggested by content review 

6 t 46454 CAT 3. Dropped due to low item-total correlation 

6 f 47493 CAT 1. Dropped as suggested by content review 

6 f 47888 CAT 2 _ Dropped as suggested by content review 

6 { 52478 PT 4 Dropped due to no scored responses 

6 { 52480 ~=~PT 4 Dropped due to no scored responses 

6 { 52587A PT 2 Dropped due to no scored responses 

6 { 5258/7B PT 2 Dropped due to no scored responses 

6 T 5258/7C PT 2 Dropped due to no scored responses 

6 { 52780 =PT 4 Dropped due to no scored responses 

6 T 53018 PI 4 _ Dropped due to no scored responses 

6 t 53019A PT 2 Dropped due to no scored responses 

6 t 53019B PT 2 Dropped due to no scored responses 


103 


ANolaalial 


CT gs\els 


iksyaal 


iksvaal 


Ci gsle (sma LU lanlelsia 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


6 T 53019C PT 2 Dropped due to no scored responses 

6 v4 53025 PI 4 _ Dropped due to no scored responses 

6 t 53026 PI 4 _ Dropped due to no scored responses 

6 { 5302/7 PT 4 Dropped due to no scored responses 

6 T 53028A PT 2 Dropped due to no scored responses 

6 T 53028B- PT 2 Dropped due to no scored responses 

6 t 53028C PT 2 Dropped due to no scored responses 

6 { 53029 PI 4 _ Dropped due to no scored responses 

6 v4 53030 PI 4 _ Dropped due to no scored responses 

6 T 53031 PI 4 _ Dropped due to no scored responses 

6 T 53032A PT 2 Dropped due to no scored responses 

6 T 53032B PT 2 Dropped due to no scored responses 

6 T 53032C PT 2 Dropped due to no scored responses 

6 t 53126 PI 4 _ Dropped due to no scored responses 

6 { 5312/7 PT 4 Dropped due to no scored responses 

6 T 53128 PIT 4 _ Dropped due to no scored responses 

6 { 53129A PT 2 Dropped due to no scored responses 

6 { 53129B PT 2 Dropped due to no scored responses 

6 T 53129C PT 2 Dropped due to no scored responses 

6 T 53768 PI 4 _ Dropped due to no scored responses 

6 t 53769 PI 4 _ Dropped due to no scored responses 

f 6 46544 CAT 3. Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 

responses 
f 6 47824 CAT 1. Dropped as suggested by content review 
f 6 47844 CAT 1. Dropped as suggested by content review 
f 6 48230 CAT 4 _ Dropped as suggested by content review 
f 6 48333 CAT 1 Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 
responses 

f 6 48350 CAT 1 _ Dropped as suggested by content review 
f 6 48701 CAT 3. Dropped as suggested by content review 
f 6 48799 CAT 3. Dropped as suggested by content review 
T 6 48801 CAT 3. Dropped due to low item-total correlation 
{ 6 52257 = PT 4 Dropped due to no scored responses 

rf 6 52268 PT 4 Dropped due to no scored responses 

{ 6 52269 = PT 4 Dropped due to no scored responses 

v4 6 52390 PI  4_ Dropped due to no scored responses 

T 6 52398A PT 2 Dropped due to no scored responses 

T 6 52398B- PT 2 Dropped due to no scored responses 

t 6 52398C PT 2 Dropped due to no scored responses 

{ 6 53021 #PT 4 Dropped due to no scored responses 

v4 6 53023 PI 4 _ Dropped due to no scored responses 

{ 6 53024A PT 2 Dropped due to no scored responses 

{ 6 53024B PT 2 Dropped due to no scored responses 

T 6 53024C PT 2 Dropped due to no scored responses 


104 


ANolaalial 


CT gs\els 
v4 


NIN NNWNNWNWNNWNNWNNNNINNNNNNNNNNNNNNNNNNWNNWNNNNNSN 


iksyaal 


iksvaal 


Grade Number 


NNNNNNWVNNVNNNNNNNDDDDDDDDAHDHDDDDADDHDHDHDAHDAHADADOO 


55085 
55086 
55088A 
55088B 
55088C 
55089 
55090 
55092 
55093 
55094A 
55094B 
55094C 
55095 
55098 
55103A 
55103B 
55103C 
55920 
55922 
50923 
95925 
55926 
55927A 
55927B 
55927C 
56012 
56121 
46117 
46264 
46424 
46442 
46454 
46472 
47369 
47471 
47493 
47509 
47872 
47882 
47888 
47928 
48405 
52587A 


CAT/ 
PT 


PT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
PT 


Claim 


MO RPNNNNRFPRPRP FWWAWO WRENN FPPPPPPHPHAPNNNAANNNFPPHPANNN SF 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 


Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


FreSpOnses 


105 


PILOT ANALYSIS SUMMARY OF RESULTS 


Admin Item Item CAT/ 


daa Gere Wines ee ea eon 

T t 5258/7B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

T vi 53019A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

t T 53019B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

t t 53028A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

{ t 53028B- PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

T T 53032A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

T t 53032B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

T T 53129A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

T T 53129B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

{ 8 46223 CAT 2 _ Dropped due to no scored responses 

f 8 46507 CAT 3. Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 
responses 

T 8 46509 CAT 3. Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 

{ 8 4765/ CAT 1 _ Dropped due to low item-total correlation 

t 8 47689 CAT 1. Dropped due to low item-total correlation 

f 8 47691 CAT 1. Dropped as suggested by content review 

f 8 47799 CAT 3. Dropped as suggested by content review 

f 8 48264 CAT 1. Dropped as suggested by content review 

f 8 48309 CAT 1 Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 
responses 

T 8 48344 CAT 3. Dropped due to low item-total correlation 

{ 8 5246/7 ~=~PT 4 Dropped due to no scored responses 

{ 8 52472 = PT 4 Dropped due to no scored responses 

{ 8 52473 ~=PT 4 Dropped due to no scored responses 

{ 8 52477 ~=PT 4 Dropped due to no scored responses 

T 8 52586A_ PT 2 Dropped due to no scored responses 

t 8 52586B- PT 2 Dropped due to no scored responses 

T 8 52586C PT 2 Dropped due to no scored responses 

T 8 53038 PIT 4 _ Dropped due to no scored responses 

T 8 53039 PI 4 _ Dropped due to no scored responses 

t 8 53040 PI 4_ Dropped due to no scored responses 

{ 8 53041A PT 2 Dropped due to no scored responses 

{ 8 53041B PT 2 Dropped due to no scored responses 

T 8 53041C PT 2 Dropped due to no scored responses 

{ 8 53042 PT 4 Dropped due to no scored responses 


106 


ANolaalial 


iksyaal 


iksvaal 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


CT gs\els 
v4 


DO MDMDMDADAADANDNWANDNNHNDHNNMDAHAWOWDOWAWNWDNWNAANAAANHAAWONNNNWANWNWWWWNWNNNWWWN 


Ci gs\e (sme lanl elsia 


00 


NNN NNNNNNN NON NN NNN NSN N NN NNN OS lr oO O_O CO CU CD sSCd FC CFC CC Psi sr dC CCC / 


53043 
53044 
53045A 
53045B 
53045C 
53046 
53047 
53048 
53049 
53050A 
53050B 
53050C 
53130 
DSiLS 1 
69132 
53133A 
53133B 
53133C 
46095 
46424 
46472 
47369 
47471 
52478 
52480 
52587A 
52587B 
52587C 
52780 
53018 
53019A 
53019B 
53019C 
53025 
53026 
53027 
53028A 
53028B 
53028C 
53029 
53030 
53031 
53032A 
53032B 


CAT 
CAT 
CAT 
CAT 
CAT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 


& 


MONONPBPPANHNNAFPAPNONNFPPSPNNN PPP AWWNINNNAFHPAPNNNAFPPANNNP SF 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 


107 


PILOT ANALYSIS SUMMARY OF RESULTS 


Volpallammmiccian Item AT. 
te ony Claim Pre-Treatment 


Grade Grade Number PT 


8 T 53032C PT 2 Dropped due to no scored responses 

8 v4 53126 PI 4 _ Dropped due to no scored responses 

8 { 5312/7 PT 4 Dropped due to no scored responses 

8 t 53128 PI 4 _ Dropped due to no scored responses 

8 { 53129A PT 2 Dropped due to no scored responses 

8 { 53129B PT 2 Dropped due to no scored responses 

8 t 53129C PT 2 Dropped due to no scored responses 

8 { 53768 PI 4 _ Dropped due to no scored responses 

8 T 53769 PI 4 _ Dropped due to no scored responses 

8 8 46223 CAT 2 _. Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 
responses 

8 8 46225 CAT 2 _ Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 
responses 

8 8 46507 CAT 3 _ Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 
responses 

8 8 47243 CAT 1. Dropped as suggested by content review 

8 8 47275 CAT 3. Dropped due to low item-total correlation 

8 8 47283 CAT 3. Dropped due to low item-total correlation 

8 8 47311 CAT 3. Dropped as suggested by content review 

8 8 47317 CAT 3. Dropped due to low item-total correlation 

8 8 47323 CAT 3. Dropped due to low item-total correlation 

8 8 47427 CAT 4 _ Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 
responses 

8 8 47591 CAT 1. Dropped as suggested by content review 

8 8 47599 CAT 1. Dropped as suggested by content review 

8 8 47603 CAT 1. Dropped as suggested by content review 

8 8 4/7627 CAT 1. Dropped as suggested by content review 

8 8 4/7647 CAT 1. Dropped as suggested by content review 

8 8 47649 CAT 1. Dropped due to low item-total correlation 

8 8 4765/ CAT 1 Dropped due to low item-total correlation 

8 8 4766/7 CAT 1. Dropped as suggested by content review 

8 8 47681 CAT 1. Dropped due to low item-total correlation 

8 8 47689 CAT 1. Dropped due to low item-total correlation 

8 8 47691 CAT 1. Dropped as suggested by content review 

8 8 4/7695 CAT 1 _. Dropped as suggested by content review 

8 8 47/735 CAT 1. Dropped as suggested by content review 

8 8 47741 CAT 1. Dropped due to low item-total correlation 

8 8 47799 CAT 3. Dropped as suggested by content review 

8 8 4/7948 CAT 2 _ Dropped as suggested by content review 

8 8 47952 CAT 2 _ Dropped due to low item-total correlation 

8 8 47974 CAT 2 _ Dropped due to low item-total correlation 

8 8 47994 CAT 2 _ Dropped as suggested by content review 

8 8 48010 CAT 2 _ Dropped due to low item-total correlation 

8 8 48036 CAT 2 _ Dropped as suggested by content review 


108 


ANolaalial 
Ci gs\els 


CO C0 0 0 WOWOO WOO OW WW WO WO WO Wf W 


Item Item 
Grade Number 
8 48186 
8 48264 
8 48309 
8 48335 
8 48344 
8 52586A 
8 52586B 
8 53041A 
8 53041B 
8 53045A 
8 53045B 
8 53050A 
8 53050B 
8 53133A 
8 53133B 
9 46724 
9 46726 
9 46728 
9 47779 
9 47787 
9 47789 
9 48055 
9 48067 
9 48259 
9 48607 
9 53033 
9 53034 
9 53035 
9 53036 
9 53037A 
9 53037B 
9 53037C 
9 53058 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 


BHONMNNHAFHFPPWPNONWWRF WW W 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped due to low item-total correlation 
Dropped as suggested by content review 
Collapsed categories: 0,1,2 becomes O0,0,1 due to nonmonotonic 
responses 
Collapsed categories: 0,1,2 becomes O,0,1 due to nonmonotonic 
responses 
Dropped due to low item-total correlation 
Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 
Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 
responses 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
109 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


ANolaalial 


Ci gs\els 


iksyaal 


iksvaal 


Grade Number 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


8 9 53059 PI 4_. Dropped due to no scored responses 

8 9 53060 PI 4_. Dropped due to no scored responses 

8 9 53061A PT 2 Dropped due to no scored responses 

8 9 53061B PT 2 Dropped due to no scored responses 

8 9 53061C PT 2 Dropped due to no scored responses 

8 9 55091 ~=PT 1 Dropped due to no scored responses 

8 9 55096 ~PT 1 Dropped due to no scored responses 

8 9 55102 ~=PT 1 Dropped due to no scored responses 

8 9 55108A_ PT 1 Dropped due to no scored responses 

8 9 55108B- PT 1 Dropped due to no scored responses 

8 9 55108C PT 1 Dropped due to no scored responses 

8 9 55111 ~3=PT 1 Dropped due to no scored responses 

8 9 55112 = PT 1 Dropped due to no scored responses 

8 9 55113 ~=PT 1 Dropped due to no scored responses 

8 9 55114A- PT 1 Dropped due to no scored responses 

8 9 55114B OPT 1 Dropped due to no scored responses 

8 9 55114C PT 1 Dropped due to no scored responses 

8 9 55553 PI 4. Dropped due to no scored responses 

8 9 55556 PI 4 _. Dropped due to no scored responses 

8 9 55557 PI 4 _ Dropped due to no scored responses 

8 9 55559A_ PT 2 Dropped due to no scored responses 

8 9 55559B- OPT 2 Dropped due to no scored responses 

8 9 55559C OPT 2 Dropped due to no scored responses 

8 9 55598 PI 4_ Dropped due to no scored responses 

8 9 55600 PI 4 _ Dropped due to no scored responses 

8 9 55601 PI 4 _. Dropped due to no scored responses 

8 9 55624 PI 4 _ Dropped due to no scored responses 

8 9 55625 PI 4. Dropped due to no scored responses 

8 9 55626 PI 4 _ Dropped due to no scored responses 

8 9 55627A PT 2 Dropped due to no scored responses 

8 9 5562/7B PT 2 Dropped due to no scored responses 

8 9 5562/7C PT 2 Dropped due to no scored responses 

8 9 55902 PI 4 _ Dropped due to no scored responses 

8 9 55903 PI 4. Dropped due to no scored responses 

8 9 55904 PI 4 _ Dropped due to no scored responses 

8 9 55905A_ =PT 2 Dropped due to no scored responses 

8 9 55905B- OPT 2 Dropped due to no scored responses 

8 9 55905C_ OPT 2 Dropped due to no scored responses 

9 8 47275 CAT 3 _ Dropped due to low item-total correlation 
9 8 47311 CAT 3. Dropped as suggested by content review 
9 8 47317 CAT 3. Dropped due to low item-total correlation 
9 8 47427 CAT 4 _ Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 

responses 
9 8 4/7627 CAT 1. Dropped as suggested by content review 


110 


ANolaalial 


CT gs\els 
9 


OOOO O DODO DAODOD DODD DODODAODADADADADADANADNADADADADADADAIDADAIDADAADANADAOA DODO MO DO DODO DODO OOOO OOO O O O O 


iksyaal 


iksvaal 


Grade Number 


OODDODODADAD DO IOWA WA AWA AWAWHWAAWHAAWHAWHAMAAWAMOMOMAWAOOWeO WOO WOW WO WOO WO WO WO W W 


47667 
47695 
47948 
47994 
48186 
52467 
5247/2 
52473 
52477 
52586A 
52586B 
52586C 
53038 
53039 
53040 
53041A 
53041B 
53041C 
53042 
53043 
53044 
53045A 
53045B 
53045C 
53046 
53047 
53048 
53050A 
53050B 
53050C 
53130 
Holo 
bo132 
53133A 
53133B 
53133C 
46724 
46726 
46728 
47779 
4778/ 
47789 
48055 
48067 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 


MONWWRWWWINNNFPFKPANNNFPPANNNAFPANNNFPPANNNFPPPHPRNYN KF 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 


111 


ANolaalial 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


CT gs\els 


OoOO OOO O O O O O 


© 


© 


© 


Item Item 
Ci gs\e(smem LU lanl eleva 
9 48259 
9 48607 
9 53037A 
9 53037B 
9 53060 
9 53061A 
9 53061B 
9 53390 
9 53392 
9 53421 
9 53435 
9 53439 
9 53473 
9 53488 
9 53490 
9 53492 
9 53630 
9 55108A 
9 55108B 
9 55112 
9 55114A 
9 55114B 
9 55559A 
9 55559B 
9 55627A 
9 55627B 
9 55905A 
9 55905B 
10 51542 
10 51554 
10 53530 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
PT 


PT 


PT 
PT 


PT 


PT 


PT 


PT 


PT 


PT 


PT 


CAT 


CAT 
CAT 


PNHNHWWWwWRRRP RB 


NO 


Dropped as suggested by content review 
Dropped due to low item-total correlation 


Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
Collapsed categories: 
responses 
Collapsed categories: 
responses 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2 becomes O,1,1 due to sparse responses 
0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 


Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 
Collapsed categories: 
responses 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2 becomes O,1,1 due to sparse responses 
0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


Dropped due to no scored responses 
Dropped due to no scored responses 


Collapsed categories: 
112 


0,1,2 becomes O,1,1 due to sparse responses 


ANolaalial 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Grade 


© 


OOD DODODODADADNADNDADANADNDADNDADADADADADADADADADADADAOADAOADAOADO DODO DODADDODODOO OOO O O O O 


ltem Item 
Grade Number 
10 53538 
10 53548 
10 53558 
10 53594 
10 53596 
10 53598 
10 53600 
10 53606 
10 53612 
10 53620 
10 53624 
10 55097 
10 55101 
10 55104 
10 55107A 
10 55107B 
10 55107C 
10 55258 
10 55259 
10 55260A 
10 55260B 
10 55260C 
10 55619 
10 55620 
10 55621 
10 55622 
10 55623A 
10 55623B 
10 55623C 
10 55918 
10 55930 
10 55931 
10 55932 
10 55933A 
10 55933B 
10 55933C 
9 53033 
9 53034 
9 53035 
9 53036 
9 53037A 
9 53037B 
9 53037C 
9 53058 


BHONMNHAHAHAHAHONNAPHPHPAHPNNNFPPPAPNNNPAPNNNFPPPPwWWBWNNNNF KF 


Dropped due to low item-total correlation 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 


113 


ANolaalial 


iksyaal 


iksvaal 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Ci gee[= 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 


Grade Number 


OOOO DODODODODADAVDIDANADNDADANADAIDNDADAIDANADADAADANADAOA DAO DODO DODO DO DADADADADAIDANADAIDADAIDANDADADADADA GO OO OO O O O 


53059 
53060 
53061A 
53061B 
53061C 
53378 
53390 
53392 
53419 
53421 
53435 
53439 
53473 
53488 
53490 
53492 
53630 
55091 
55096 
55102 
55108A 
55108B 
55108C 
55111 
pot? 
55113 
55114A 
55114B 
55114C 
55553 
55556 
D0007 
55559A 
55559B 
55559C 
55598 
55600 
55601 
55624 
55625 
55626 
55627A 
55627B 
55627C 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 


iN 


MUNNBPPRPBPPBPBHNHNHNPHPPBPRPRPBRPBRBRBRBRBRBRRBRRBRRPNHWBWWWRRRRRRRNOND A 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 


114 


PILOT ANALYSIS SUMMARY OF RESULTS 


AVolaaliaummmn iksiaa Item CAT 
/ Claim Pre-Treatment 


Grade Grade Number PT 
10 9 55902 ~=~PT 


iN 


Dropped due to no scored responses 


10 9 55903 SO~PT Dropped due to no scored responses 
10 9 55904 _~=SO~PTT Dropped due to no scored responses 
10 9 55905A_ OPT Dropped due to no scored responses 
10 9 55905B- OPT Dropped due to no scored responses 
10 9 55905C_ OPT Dropped due to no scored responses 


10 10 48704 CAT 
10 10 48719 CAT 
10 10 48726 CAT 
10 10 48846 CAT 
10 10 4889/7 CAT 
10 10 4890/7 CAT 
10 10 48909 CAT 
10 10 49356 CAT 
10 10 49530 CAT 
10 10 49532 CAT 
10 10 49536 CAT 
10 10 49599 CAT 
10 10 49603 CAT 
10 10 53538 CAT 
10 10 53558 CAT 
10 10 53594 CAT 
10 10 53596 CAT 
10 10 53598 CAT 
10 10 53600 CAT 
10 10 53606 CAT 
10 10 53612 CAT 
10 10 53620 CAT 
10 10 53624 CAT 
10 10 55107A PT 


Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


MO BWWWNHNHONNRPPRPWW HA HKFPDNPRPRRPRRPWWBHNHINNDND SF SF 


responses 

10 10 55107B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

10 10 55260A_ PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

10 10 55260B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

10 10 55623A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 

10 10 55623B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 
responses 


10 10 55918 PT 4 _ Dropped due to low item-total correlation 
10 10 55933A PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,2 due to sparse 
responses 


115 


PILOT ANALYSIS SUMMARY OF RESULTS 


i AT 
Admin Item item CAT/ Claim Pre-Treatment 


Grade Grade Number PT 
10 10 55933B PT 2 Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,2 due to sparse 
responses 
10 11 48/722 CAT 1. Dropped due to low item-total correlation 
10 11. 48739 CAT 3. Dropped due to low item-total correlation 


10 11 48745 CAT 3. Collapsed categories: 0,1,2 becomes 0,1,1 due to nonmonotonic 
responses 

10 11 49180 CAT 1. Dropped as suggested by content review 

10 11 49398 CAT 2 _ Dropped as suggested by content review 

10 11 49452 CAT 2 _ Dropped as suggested by content review 

10 11 49460 CAT 2 _ Dropped as suggested by content review 

10 11 49585 CAT 4 _ Collapsed categories: 0,1,2 becomes 0,1,1 due to nonmonotonic 


responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Collapsed categories: 0,1,2 becomes O0,1,1 due to nonmonotonic 
responses and sparse responses 


10 11 49631 CAT 
10 11 49635 CAT 
10 11 49675 CAT 
10 11 55150 _-~=SO~PT 
10 11 botot. PI 
10 11 55153A PT 
10 11 55153B PT 
10 11 55153C PT 
10 11 55154 ~—SOPT 
10 11 Do155 PT 
10 11 55156 ~=~PT 
10 11 55157A PT 
10 11 5515/7B PT 
10 11 55157C PT 
10 11 po156: PI 
10 11 55159 PT 
10 11 55160 = =PT 
10 11 55161 =#PT 
10 11 55162A PT 
10 11 55162B PT 
10 11 55162C PT 
10 11 55164 =PT 
10 11 55165 ~=~PT 
10 11 55166A PT 
10 11 55166B PT 
10 11 55166C PT 
10 11 55604 ~=O~PT 
10 11 55921 = PT 


BAHN NHAANNNAFPFPRANNNFPAPNONN FPWBW W 


10 11 55928 PTI 4 _ Dropped due to no scored responses 
10 11 55929 PT 4 _ Dropped due to no scored responses 
10 11 55934 PT  4_ Dropped due to no scored responses 


116 


Nolaalie 
Ci gee[= 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
11. 
11 
11 
11 
11 
11 
11 
11 
11. 
11 
11 
11 
11 
11 
11 
11 
11. 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 


iksyaal 


iksvaal 


Ci gs\e(smem\ LU lanl eleva 


OOOO ODODODODODADIDANADIDANADAIDANDADAIDADADADADAOADOHO DOO OOO OOOO OO O O O 


55935 
55936A 
55936B 
55936C 

59937 

55938 

55939 
55940A 
55940B 
55940C 

56097 

46724 

46726 

46728 

47779 

4778/ 

47789 

48055 

48067 

48259 

48607 

50395 

53034 

53035 

53036 
53037A 
53037B 
53037C 

53058 

53059 

53060 
53061A 
53061B 
53061C 

53378 

53390 

53392 

53416 

53419 

53421 

53435 

53439 

53473 

53488 


CAT/ 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 

CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 
PT 

CAT 

CAT 

CAT 

CAT 

CAT 

CAT 

CAT 

CAT 

CAT 

CAT 


Claim 


aS 


WWRPRRRPRBRKRBRBRBENHNHYNHNARANHNNNARKRARWBENNWWKHEWOWOWANNNAAKRRADRAA 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Collapsed categories: 0,1,2 becomes O0,1,1 due to sparse responses 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 


117 


ANolaalial 


iksyaal 


iksvaal 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Ci gzle[= 
11 
11 
11 
44. 
11 
11 
11 
44. 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11. 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
141 
11 
11 
11 


Grade Number 


OOD DODODODODDANADNDADANADNDADNDADNDAIDADAIDADAIDADAIDADADADADAOADAOADODAODAODADAODADADODDODO OOO O O O O 


53490 
53492 
53630 
55091 
55096 
55102 
55108A 
55108B 
55108C 
55111 
55112 
55113 
55114A 
55114B 
55114C 
5005S 
55556 
5999 / 
55559A 
55559B 
55559C 
55598 
55600 
55601 
55624 
55625 
55626 
55627A 
55627B 
55627C 
55902 
55903 
55904 
55905A 
55905B 
55905C 
48704 
48719 
48726 
48846 
48897 
48907 
48909 
48923 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 


‘e8) 


PRRRPRRWWNHINNNHBPKHBRNHNHNHPAHRHBPHPRBRHYHNHNHAHPHPPRPBBRPBRBRBRRPBR BRB D W& 


Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 


118 


PILOT ANALYSIS SUMMARY OF RESULTS 


AVolaaliaummmn iksiaa iksvaal 


ete=\e(-MaaN Ulaalevcte Pre-Treatment 


Grade 


11 10 48925 CAT 1 Collapsed categories: 0,1,2 becomes O,1,1 due to Sparse responses 

11 10 49002 CAT 1. Collapsed categories: 0,1,2 becomes 0,1,1 due to nonmonotonic 
responses and sparse responses 

11 10 49356 CAT 2 _ Dropped as suggested by content review 

11 10 49530 CAT 4_ Dropped due to low item-total correlation 

11 10 49532 CAT 4 _ Dropped due to low item-total correlation 

11 10 49536 CAT 4 _ Dropped as suggested by content review 

11 10 49599 CAT 3. Dropped due to low item-total correlation 

11 10 49603 CAT 3 _ Dropped due to low item-total correlation 

11 10 51542 CAT 2 _ Dropped due to no scored responses 

11 10 51554 CAT 2 _ Dropped due to no scored responses 

11 10 53517 CAT 1 Collapsed categories: 0,1,2 becomes O,1,1 due to Sparse responses 

11 10 53530 CAT 1 Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 

11 10 53538 CAT 1. Dropped due to low item-total correlation 

11 10 53548 CAT 1 Collapsed categories: 0,1,2 becomes O,1,1 due to nonmonotonic 
responses and sparse responses 

11 10 53558 CAT 1 _ Dropped as suggested by content review 

11 10 53594 CAT 2 _ Dropped as suggested by content review 

11 10 53596 CAT 2 _ Dropped as suggested by content review 

11 10 53598 CAT 2 _ Dropped as suggested by content review 

11 10 53600 CAT 2 _ Dropped as suggested by content review 

11 10 53606 CAT 3 _ Dropped due to low item-total correlation 

11 10 53612 CAT 3. Dropped due to low item-total correlation 

11 10 53620 CAT 3 _ Dropped due to low item-total correlation 

11 10 53624 CAT 4 _ Dropped as suggested by content review 

11 10 53628 CAT 4 _ Collapsed categories: 0,1,2 becomes O,1,1 due to nonmonotonic 
responses 

11 10 55097 = PT 4 Dropped due to no scored responses 

11 10 55101 ~=PT 4 Dropped due to no scored responses 

11 10 55104 _ ~=OPT 4 Dropped due to no scored responses 

11 10 55107A_ PT 2 Dropped due to no scored responses 

11 10 55107B- PT 2 Dropped due to no scored responses 

11 10 55107C PT 2 Dropped due to no scored responses 

11 10 55258 — PT 4 Dropped due to no scored responses 

11 10 55259 ~~ ~PT 4 Dropped due to no scored responses 

11 10 55260A_ PT 2 Dropped due to no scored responses 

11 10 55260B_ PT 2 Dropped due to no scored responses 

11 10 55260C PT 2 Dropped due to no scored responses 

11 10 55619 _ =PT 4 Dropped due to no scored responses 

11 10 55620 ~~ ~=PT 4 Dropped due to no scored responses 

11 10 55621 = PT 4 Dropped due to no scored responses 

11 10 55622 PT 4 Dropped due to no scored responses 

11 10 55623A PT 2 Dropped due to no scored responses 

11 10 55623B- PT 2 Dropped due to no scored responses 


119 


PILOT ANALYSIS SUMMARY OF RESULTS 


Admin Item Item CAT/ 
Grade Grade Number PT 


Claim Pre-Treatment 


11 10 55623C PT 2 Dropped due to no scored responses 

11 10 55918 PT 4 _ Dropped due to low item-total correlation 

11 10 55930 PI  4_ Dropped due to no scored responses 

11 10 55931 ~=PT 4 Dropped due to no scored responses 

11 10 55932 = PT 4 Dropped due to no scored responses 

11 10 55933A PT 2 Dropped due to no scored responses 

11 10 55933B PT 2 Dropped due to no scored responses 

11 10 55933C PT 2 Dropped due to no scored responses 

11 11 48705 CAT 2 _ Dropped as suggested by content review 

11 11 48709 CAT 1 Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 
responses 

11 11 48712 CAT 1. Dropped due to low item-total correlation 

11 11 48722 CAT 1. Dropped due to low item-total correlation 

11 11 48739 CAT 3. Dropped due to low item-total correlation 

11 11 48745 CAT 3 _ Collapsed categories: 0,1,2 becomes 0,1,1 due to nonmonotonic 
responses 

11 11 49133 CAT 1 . Dropped due to low item-total correlation 

11 11 49139 CAT 1 . Dropped due to low item-total correlation 

11 11 49180 CAT 1. Dropped as suggested by content review 

11 11 49190 CAT 1. Dropped due to low item-total correlation 

11 11 49198 CAT 1 Collapsed categories: 0,1,2 becomes 0,1,1 due to nonmonotonic 
responses and sparse responses 

11 11 49200 CAT 1 . Collapsed categories: 0,1,2 becomes O,1,1 due to nonmonotonic 


responses 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
11 11 49452 CAT Dropped as suggested by content review 
11 11 49460 CAT Dropped as suggested by content review 


11 11 49228 CAT 1 
1 
1 
2 
2 
2 
2 
2 
2 
2 
11 11 49468 CAT 2 _ Dropped as suggested by content review 
2 
1 
A 
3 
3 
3 
3 
1 
2 


11 11 49236 CAT 
11 11 49238 CAT 
11 11 49392 CAT 
11 11 49398 CAT 
11 11 49408 CAT 
11 11 49420 CAT 
11 11 49448 CAT 


11 11 A9472 CAT Dropped as suggested by content review 

11 11 49502 CAT Dropped due to low item-total correlation 

11 11 49559 SCAT Dropped as suggested by content review 

11 11 49635 CAT Dropped due to low item-total correlation 

11 11 4965/7 CAT Dropped due to low item-total correlation 

11 11 49675 CAT Dropped as suggested by content review 

11 11 50204 CAT Dropped due to low item-total correlation 

11 11 5436/7 CAT Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
11 11 55153A PT Collapsed categories: 0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


120 


ANolaalial 


iksyaal 


iksvaal 


CAT/ 
PT 


Claim 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Grade 


11 


11 


11 


11 


11 


11 


11 


11 
11 


11 


11 


11 


Ci gs\e(smamm\ LU lanl eleva 


11 


11 


11 


11 


11 


11 


11 


11 
11 


11 


11 


11 


55153B 


55157A 


55157B 


55162A 


55162B 


55166A 


55166B 


55921 
55936A 


55936B 


55S940A 


55940B 


PT 


PT 


PT 


PT 


PT 


PT 


PT 


PT 
PT 


PT 


PT 


PT 


reESpOnNSES 


Collapsed categories: 


FrESpONSES 


Collapsed categories: 


reSpOnses 


Collapsed categories: 


FreESpONnSES 


Collapsed categories: 


reSpOnses 


Collapsed categories: 


reSpOnses 


Collapsed categories: 


FeESpONSES 


Collapsed categories: 


reSponses 


Collapsed categories: 
Collapsed categories: 


FESPpONSES 


Collapsed categories: 


FESPONSES 


Collapsed categories: 


reSpOnses 


Collapsed categories: 


FESpONSES 


121 


0,1,2,3,4 becomes 0O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2 becomes O,1,1 due to sparse responses 
0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes O,0,1,2,3 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


0,1,2,3,4 becomes 0,0,1,2,2 due to sparse 


Table B.2. Math Items Receiving Pre-treatment before Calibration based on Data Clearning 


Smarter 
Balanced 


Assessment Consortium 


Procedure 


PANolaalial 


Grade Grade Number 
47167 
47185 


WOW WW W 


ann»oyr BPHHPHPHPHPHPAPHPHRHHPHPPAPwowvowwvowwwo wo W W 


Item 


woOwWWW WwW 


RHP HPan»niaoodiysip HP PF Pwowwowowvowo wo wi sf HP HPPHPPHKH SA 


iksvaal 


48754 
51720 
51728 
51756 
51806 


45981 
45983 
51666 
51672 
51684 
51686 
51694 
51696 
51700 
51702 
47167 
48754 
51756 
51816 
51822 
51632 
51834 
51856 
45981 
45983 
51666 
51672 
53177 
53303 
53760 
54341 
54363 
54955 
53147 
53151 
53155 
H3157 


OPP RB Ww 


PRPRPRBIRA PRP RP BIR RP RR BIWNNHKBRBRPRWWINPRNBRPRPBRB BB 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped as suggested by content review 


Collapsed categories: 0,1,2,3 becomes 0,1,1,2 due to nonmonotonic 


responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 


Collapsed categories: 0,1,2 becomes 0,0,1 due to nonmonotonic 


responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 


122 


ANolaalial 


Grade 


5 4 53159 CAT 1. Dropped due to no scored responses 

5 4 53229 CAT 2 _ Dropped due to no scored responses 

5 4 53271 CAT 2 _ Dropped due to no scored responses 

5 5 42994 CAT 3. Dropped as suggested by content review 
5 5 43574 CAT 1 _ Dropped as suggested by content review 
5 5 45625 CAT 4_ Dropped due to low item-total correlation 
5 5 45967 CAT 2 _ Dropped due to low item-total correlation 
6) 5 47158 CAT 1. Dropped as suggested by content review 
5 5 53101 CAT 2 Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
5 5 53303 CAT 1. Dropped due to low item-total correlation 
5 5 54365 CAT 2 _ Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
5 5 54955 CAT 4 _ Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
5 6 42925 CAT 1. Dropped as suggested by content review 
D 6 43482 CAT 2 _ Dropped as suggested by content review 
5 6 4664/7 CAT 3. Dropped due to no scored responses 

5 6 46938 CAT 1. Dropped due to low item-total correlation 
5 6 48655 CAT 2 _ Dropped as suggested by content review 
5 6 52924 CAT 1. Dropped due to no scored responses 

5 6 52930 CAT 1. Dropped due to no scored responses 

6 5 42994 CAT 3. Dropped as suggested by content review 
6 5 43574 CAT 1. Dropped as suggested by content review 
6 5 45625 CAT 4 _ Dropped due to low item-total correlation 
6 5 4596/7 CAT 2 _ Dropped due to low item-total correlation 
6 5 5261/7 CAT 1. Dropped due to no scored responses 

6 5 53095 CAT 1. Dropped due to no scored responses 

6 5 5309/7 CAT 1. Dropped due to no scored responses 

6 5 53099 CAT 1. Dropped due to no scored responses 

6 5 53103 CAT 2 _ Dropped due to no scored responses 

6 6 42781 CAT 1 Dropped as suggested by content review 
6 6 42925 CAT 1. Dropped as suggested by content review 
6 6 42986 CAT 3 _ Dropped as suggested by content review 
6 6 43209 CAT 1. Dropped as suggested by content review 
6 6 43383 CAT 2 _ Dropped as suggested by content review 
6 6 43482 CAT 2 _ Dropped as suggested by content review 
6 6 4378/7 CAT 1 _ Dropped as suggested by content review 
6 6 43910 CAT 1. Dropped as suggested by content review 
6 6 44038 CAT 1 _ Dropped as suggested by content review 
6 6 44055 CAT 1 _ Dropped as suggested by content review 
6 6 46045 CAT 1. Dropped due to low item-total correlation 
6 6 46049 CAT 1 . Dropped as suggested by content review 
6 6 46558 CAT 1 _ Dropped as suggested by content review 
6 6 46652 CAT 4 _ Dropped as suggested by content review 
6 6 46938 CAT 1. Dropped due to low item-total correlation 
6 6 46944 CAT 1. Dropped due to low item-total correlation 


iksyaal 


Ci gsle(smamm\ LU lanl elsia 


iksvaal 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


123 


ANolaalial 


CT gs\els 


6 
6 
6 
6 
6 
6 
6 
6 
6 


NNN NN NNN NON ONIN NN NNNNNSN NONI OOO OD DW WHO HOOD DMD OD O 


iksyaal 


Ci gs\o (sme LU lanl eleva 


NODA AMAAMAD DOD Oo 


NNN NWNWNWNNNNNDDMHDAMVDAMNAM ON NNwNNNNWNWNWNANNG 


iksvaal 


MNWRNHRERRENE EB 


RPRREPNHNHRFPRRPANBHBIBR BR RBNBBRBNBRBBRP RBBB RBRBBRBRBBDNB 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 


Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 


Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 


Collapsed categories: 0,1,2 becomes O,0,1 due to nonmonotonic 


reSponses 


Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 


Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 


124 


ANolaalial 


CT gs\els 


f f 44831 CAT 3 _. Dropped as suggested by content review 
f f 44832 CAT 3. Dropped as suggested by content review 
f f 44906 CAT 1 _ Dropped as suggested by content review 
T t 45059 CAT 1 . Dropped as suggested by content review 
f f 45090 CAT 4 _ Dropped as suggested by content review 
f f 45093 CAT 1 _ Dropped as suggested by content review 
f f 45103 CAT 1 _ Dropped as suggested by content review 
f f 45105 CAT 1 _ Dropped as suggested by content review 
f f 45581 CAT 1 _ Dropped as suggested by content review 
{ { 45726 CAT 1. Dropped due to low item-total correlation 
{ { 46061 CAT 2 _ Dropped due to low item-total correlation 
{ T 46785 CAT 1 Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
{ { 46816 CAT 1 Dropped due to low item-total correlation 
{ { 46818 CAT 1 Dropped due to low item-total correlation 
{ { 46824 CAT 1. Dropped due to low item-total correlation 
{ { 46826 CAT 1 _ Dropped due to low item-total correlation 
{ { 46834 CAT 1 Dropped due to low item-total correlation 
T t 46838 CAT 1. Dropped due to low item-total correlation 
ré { 46840 CAT 1. Dropped due to low item-total correlation 
{ { 46842 CAT 1. Dropped due to low item-total correlation 
{ { 46844 CAT 1. Dropped due to low item-total correlation 
t t 46856 CAT 1. Dropped due to low item-total correlation 
{ { 46864 CAT 1. Dropped due to low item-total correlation 
f f 46866 CAT 1 _ Dropped as suggested by content review 
t t 46868 CAT 1 _ Dropped due to low item-total correlation 
{ { 468/72 CAT 1. Dropped due to low item-total correlation 
f t 46886 CAT 1 _ Dropped as suggested by content review 
{ { 46902 CAT 1. Dropped due to low item-total correlation 
f t 46918 CAT 2 _ Dropped as suggested by content review 
f rd 48075 CAT 1. Dropped as suggested by content review 
f 8 A280/7 CAT 1 Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
f 8 42942 CAT 1 Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
f 8 43529 CAT 2 _. Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
f 8 44193 CAT 2 _ Dropped as suggested by content review 
f 8 44834 CAT 3. Dropped as suggested by content review 
t 8 45993 CAT 1 Dropped due to no scored responses 

f 8 46803 CAT 1. Dropped as suggested by content review 
T 8 4680/7 CAT 1. Dropped due to no scored responses 

{ 8 47010 CAT 1. Dropped due to low item-total correlation 
f 8 47092 CAT 1 . Dropped as suggested by content review 
{ 8 48113 CAT 1 Dropped due to no scored responses 

8 { 4286/ CAT 1. Dropped due to no scored responses 

8 f 43394 CAT 2 _ Dropped as suggested by content review 
8 f 43524 CAT 1. Dropped as suggested by content review 


iksyaal 


Grade Number 


iksvaal 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


125 


ANolaalial 


Ci gs\els 
8 


CO C&O © C0 CO WOO WMO WO WO WOO WOW WO WO WO WMO WOW WOO WO WO WO WO WO WO WO CO; 0 © © WO WO WO WO Ww W W 


iksyaal 


Grade Number 


CO © © 0 © WOO WO WMO W0O WOW WOO WOW WO WOO WMO WO WO WO WO WO WO CO WO) N N N N N N N N N N N 


iksvaal 


PRPHOHNRFPRPNRBRRRBRRNHNHNBPNPRBRRPWARKBRNHENNBPRBPBRBINBRBRNHER KR WWHR BH 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to no scored responses 

Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 


Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 


Dropped as suggested by content review 
Dropped due to low item-total correlation 


126 


ANolaalial 


Ci gs\els 


8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 


iksyaal 


Ci gsle (sme LU lanl elsia 


OOO WOW wWowowowvowowvdwowvdwdWdadonanonanonanowsdadnoedndcedndanaddada da DaoaDnaDnanaDnaDnaDnaDnaDnanaDnanaadadaddOa Oo O| & 


iksvaal 


MU NAINNNBRPRBRNRWNHRBRBRBBBRBRBBI BBR BRBBBRRBPRBRBRRPWRBRBRRBRBRBNHWBNHNAN BH] B 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 


Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 


Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to no scored responses 

Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped as suggested by content review 
Dropped as suggested by content review 


Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 


Dropped due to low item-total correlation 
Dropped as suggested by content review 


Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 


Dropped as suggested by content review 


Collapsed categories: 0,1,2 becomes O,1,1 due to Sparse responses 


Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to no scored responses 

Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 


127 


ANolaalial 


CT gs\els 
9 


OOOO DO DODD DDADADANDADANADAADANIDVDANADADADADADADADADADADADAIDADAADANADAOA DODO DODO DO DODO OOOO OOO O O O O 


iksyaal 
CT gslels: 


OOOO DODODODODODODODADADADNDADADADAIDADADADADADADAOA DOGO OOOO OO OO O O 


iksvaal 
I Ulaaleteya 


PPRPRENHEPRANNHNNRBKBINBPRBRBRNHBPRPBBRBBRBRRBRRBRBRWPRPWWRPRBRRPRBP RBBB BB 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 


Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 


Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 


Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 


Dropped due to no scored responses 

Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 


Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 


Dropped as suggested by content review 
Dropped due to low item-total correlation 


Collapsed categories: 0,1,2,3 becomes O,1,1,1 due to sparse responses 


Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to no scored responses 


128 


PILOT ANALYSIS SUMMARY OF RESULTS 


AXolaalia iksvaal 


I Ulaalelsya 


iksyaal 


Pre-Treatment 
CT gslels: 


Grade 


9 10 47026 CAT 1. Dropped as suggested by content review 

9 10 48/788 CAT 1 Dropped due to no scored responses 

9 10 48794 CAT 1 Dropped due to no scored responses 

10 9 44128 CAT 1 Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
10 9 44388 CAT 1. Dropped as suggested by content review 

10 9 44444 CAT 1. Dropped due to low item-total correlation 

10 9 44475 CAT 1. Dropped due to low item-total correlation 

10 9 44478 CAT 1. Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
10 9 44489 CAT 1 Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
10 9 44944 CAT 2 _ Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
10 9 44948 CAT 1 . Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
10 9 44952 CAT 1 _ Dropped as suggested by content review 

10 9 44958 CAT 1 . Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
10 9 44959 CAT 1. Dropped due to low item-total correlation 

10 9 44972 CAT 3. Dropped as suggested by content review 

10 9 45098 CAT 3 _ Dropped due to low item-total correlation 

10 9 45396 CAT 1. Dropped as suggested by content review 

10 9 45400 CAT 1. Dropped due to low item-total correlation 

10 9 45402 CAT 1. Dropped due to low item-total correlation 

10 9 45410 CAT 1 . Dropped due to low item-total correlation 

10 9 45426 CAT 1. Dropped due to low item-total correlation 

10 9 45450 CAT 1. Dropped due to low item-total correlation 

10 9 45480 CAT 2 _ Dropped as suggested by content review 

10 9 45489 CAT 1 Dropped due to no scored responses 

10 9 45578 CAT 2 _ Dropped due to low item-total correlation 

10 10 43535 CAT 2 _ Dropped as suggested by content review 

10 10 43543 CAT 2 _ Dropped as suggested by content review 

10 10 4368/7 CAT 2 _ Dropped as suggested by content review 

10 10 43701 CAT 2 _ Dropped as suggested by content review 

10 10 44517 CAT 1. Dropped as suggested by content review 

10 10 44525 CAT 1. Dropped due to low item-total correlation 

10 10 44735 CAT 2 _ Dropped as suggested by content review 

10 10 44737 CAT 2 Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
10 10 45077 CAT 1. Dropped due to low item-total correlation 

10 10 45079 CAT 1. Dropped due to low item-total correlation 

10 10 45645 CAT 2 _ Dropped due to low item-total correlation 

10 10 46988 CAT 1 _. Dropped as suggested by content review 

10 10 46994 CAT 1 . Dropped as suggested by content review 

10 10 47026 CAT 1. Dropped as suggested by content review 

10 10 47080 CAT 1. Dropped due to low item-total correlation 

10 10 497/76 CAT 1 Dropped due to low item-total correlation 

10 10 49778 CAT 1. Dropped as suggested by content review 

10 10 49780 CAT 2 _ Dropped due to low item-total correlation 

10 11 42724 CAT 1 Dropped due to no scored responses 


129 


ANolaalial 
Grade 


iksyaal 
CT gslels: 


iksvaal 
NU laalelsya 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 


OOOO OO DODO OOO OO O O O 


BPRPRRPRPRBRNRPWWNHNHANHKBFPRPRARRNBRARKBRNHPRBRBRRBRBRBRBRNWNHRENB BB 


Dropped due to no scored responses 

Dropped as suggested by content review 

Dropped as suggested by content review 

Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped as suggested by content review 

Dropped due to low item-total correlation 

Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped due to low item-total correlation 

Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped due to low item-total correlation 

Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped due to no scored responses 

Dropped due to no scored responses 

Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped due to low item-total correlation 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped as suggested by content review 

Dropped due to low item-total correlation 

Dropped due to no scored responses 


130 


ANolaalial 


iksyaal 


iksvaal 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Ci gzle[= 
11 
11 
11 
44. 
11 
11 
11 
44. 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11. 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 


Ci gs\o (sme LU lanl eleva 


OOOO DODODODODADAVDIDADNDAIDANADADNDAIDADAIDANAIDADAADANAADAOA DODO DODO DODO DADADADADAIDANADAIDANAIDADADADADADA GO OOOO O O O 


MNPRPRRPRRPRBRNHBPRBPBBRBBRBBRBRRBPBBRBRRBRWRRPRBRWWRRRBRRBRRBRRBRRBENHBPRWRBNHER BB 


Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped due to no scored responses 
Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
Dropped due to no scored responses 
Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to no scored responses 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped due to no scored responses 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped as suggested by content review 
Dropped due to no scored responses 
Dropped due to no scored responses 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 


131 


PILOT ANALYSIS SUMMARY OF RESULTS 


Admin Item iksvaal 


ete=\e(-MaaNUlaalevcte Pre-Treatment 


Grade 


11 10 42950 CAT 1 . Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
11 10 43101 CAT 1 Dropped due to no scored responses 

11 10 43112 CAT 2 _ Dropped due to no scored responses 

11 10 43115 CAT 4 _ Dropped due to no scored responses 

11 10 43414 CAT 3. Dropped due to no scored responses 

11 10 43489 CAT 2 _. Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
11 10 43535 CAT 2 _ Dropped as suggested by content review 

11 10 43538 CAT 3. Dropped due to no scored responses 

11 10 43543 CAT 2 _ Dropped as suggested by content review 

11 10 4368/7 CAT 2 _ Dropped as suggested by content review 

11 10 43701 CAT 2 _ Dropped as suggested by content review 

11 10 43740 CAT 2 _ Dropped due to no scored responses 

11 10 43859 CAT 4. Dropped due to no scored responses 

11 10 44349 CAT 1. Dropped due to no scored responses 

11 10 44443 CAT 1. Dropped due to no scored responses 

11 10 44517 CAT 1. Dropped as suggested by content review 

11 10 44525 CAT 1. Dropped due to low item-total correlation 

11 10 44604 CAT 1. Dropped due to no scored responses 

11 10 44610 CAT 1. Dropped due to no scored responses 

11 10 44615 CAT 1 Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
11 10 44735 CAT 2 _ Dropped as suggested by content review 

11 10 4473/7 CAT 2. Collapsed categories: 0,1,2,3 becomes 0,1,1,1 due to sparse responses 
11 10 44774 CAT 2 Collapsed categories: 0,1,2 becomes O,1,1 due to Sparse responses 
11 10 447/76 CAT 3. Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
11 10 45066 CAT 1. Dropped due to no scored responses 

11 10 4506/7 CAT 1. Dropped due to no scored responses 

11 10 45068 CAT 1 Dropped due to no scored responses 

11 10 45069 CAT 1. Dropped due to no scored responses 

11 10 45071 CAT 4 _ Dropped due to no scored responses 

11 10 45074 CAT 3. Dropped due to no scored responses 

11 10 45075 CAT 4 _ Collapsed categories: 0,1,2,3 becomes 0,1,1,1 due to sparse responses 
11 10 4507/7 CAT 1. Dropped due to low item-total correlation 

11 10 45079 CAT 1. Dropped due to low item-total correlation 

11 10 45083 CAT 2 _. Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
11 10 45084 CAT 3 _ Collapsed categories: 0,1,2,3 becomes 0,1,2,2 due to sparse responses 
11 10 45089 CAT 1 Dropped due to no scored responses 

11 10 45463 CAT 1 Dropped due to no scored responses 

11 10 45466 CAT 1 Dropped due to no scored responses 

11 10 45470 CAT 1. Dropped due to no scored responses 

11 10 45643 CAT 1. Dropped due to no scored responses 

11 10 45645 CAT 2 _ Dropped due to low item-total correlation 

11 10 45917 CAT 1. Dropped due to no scored responses 

11 10 46611 CAT 1 Dropped due to no scored responses 

11 10 46613 CAT 1 Dropped due to no scored responses 


132 


ANolaalial 


iksyaal 


iksvaal 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Ci gete[= 
11 
11 
11 
44. 
11 
11 
11 
44. 
11 
11 
11 
11 
11 
14. 
11 
11 
11 
11 
11 
11. 
11 
11 
11 
11 
44. 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
141 
11 
11 
11 


Ci gs\els 


Ul aalelsya 


WNNFRRPWWRANHRRPBRBBRNHAKBRRBRBRBRBRBRRBPBRBRBRBRRBBHBINBBRRBPRBRBRBPRBPBR BB BB 


Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped due to no scored responses 

Dropped as suggested by content review 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped due to no scored responses 

Dropped due to no scored responses 

Dropped due to low item-total correlation 

Dropped as suggested by content review 

Dropped due to low item-total correlation 

Dropped as suggested by content review 

Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped due to low item-total correlation 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped due to low item-total correlation 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped due to low item-total correlation 

Dropped as suggested by content review 

Dropped as suggested by content review 

Dropped due to low item-total correlation 

Dropped due to low item-total correlation 

Dropped due to low item-total correlation 

Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Dropped due to low item-total correlation 

Dropped due to low item-total correlation 

Dropped as suggested by content review 

Dropped due to low item-total correlation 

Dropped due to low item-total correlation 

Dropped due to low item-total correlation 

Dropped as suggested by content review 


133 


Nolaalie 
Ci gzle[= 
11 
11 
11 
11 
11 
11 
11 
14. 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 


iksyaal 
CT gslels: 


iksvaal 
Number 


PRPPPNHNRPRPRPRRPRBRRR AR WWWR 


PILOT ANALYSIS SUMMARY OF RESULTS 


Pre-Treatment 


Dropped due to low item-total correlation 
Dropped as suggested by content review 


Collapsed categories: 0,1,2 becomes O,1,1 due to sparse responses 
Collapsed categories: 0,1,2,3 becomes O,1,2,2 due to sparse responses 


Dropped due to low item-total correlation 
Dropped due to low item-total correlation 


Collapsed categories: 0,1,2,3 becomes O,1,2,2 due to sparse responses 


Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped as suggested by content review 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 
Dropped due to low item-total correlation 


134 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Table B.3. ELA Items Receiving Treatment during IRT Calibration Under Different Model 
Combinations 


ANolaalia iksvaal iesyaal 
Treatment 


sed od OR Ad WAC | os ORC) od WAC AO 


Grade 


‘e8) 


aon»»»ao»o»»»o»unrnrinio cot BAHAR HHHAHHA HHP PPP HFP#wwwwowwow wo W Ww 


Grade 


eo) 


aon»»»o»oa»o»oow»unrnrnrirskt HRoP PHP HPHPHAHA HHP PP PvnsPr PPP WWW WW W 


UT aalexsye 
54269 


53995 
53977 
54045 
54197 
54209 
54287 
54382 
54998 
55002 
55011 
54097 
56188B 
54426 
54998 
55011 
55002 
55008 
90392 
54430 
54492 
55277 
55362 
55316 
55426 
54726 
54498 
55316 
55110B 
55547A 
56274B 
56322A 
54726 
54678 
54884 
54698 
54714 
54776 
54808 
54818 
54918 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 


WNNFPFRFRWRRNNNNHNRFNYNPRPRPRRRPNRPRP WWWBWRNPWWWRFP RP WRF WD W 


Category starting value re- 
assigned 
Guessing fixed at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Dropped due to LID 
Guessing fixed at 0.25 
Guessing fixed at 0.25 
Guessing fixed at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Guessing fixed at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
135 


<x K XK XK 


>< 


~~ KK KK KK KK OK OK OK OK OK OK KOK OK KOK OK KKK OK OK OK OK OK OK KOKO OK OK OK OK 


ANovaalial 


Grade 
5 


NANA AADAAANDAWAANAAAAAANAANAAANAANAAANAAWNHAD WH ® OO} OI Ol 


NI NWNNWNNNIO O 


aksvaal 
‘Ci gle(s) 


NANA AAA AAAAAAAANAANAANAA AAA WNHAH WH O1 OO O 


NI NWNNWN ON N 


ltem 
Number 
54924 
53646 
52843 
b232 1. 
53646 
52398B 
53024A 
55088A 
55103A 
55927B 
52847 
52837 
52839 
52845 
52349 
52633 
52639 
52675 
52677 
52679 
52750 
52768 
52776 
52863 
52871 
52877 
47830 
53024B 


47525 
47557 
47830 
52587A 
53019B 
53028A 
53032B 
53129B 
46480 


H9632 


A7467 


47557 
47517 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
PT 


CAT 
CAT 
CAT 
PT 
PT 
PT 
PT 
PT 
CAT 


PT 


CAT 


CAT 
CAT 


MO RPWWwWONNRPRPRPRRPRPRRP RP wWwWwwWwWBtnNNNN RP WW BR 


WO NMNNNNN FIF F&F 


IS 


— 


PILOT ANALYSIS SUMMARY OF RESULTS 


Treatment 


Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Guessing fixed at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Category starting value re- 
assigned 

Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Category starting value re- 
assigned 

Category starting value re- 
assigned 

Guessing fixed at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 


136 


< Kx KK OX 


<x KK KK OX 


sed od OR Ad WAC | hOB) ed WAC] AO 


>< 


KX KK KK KK KK KK KOK OK OKO OK KOK OK OK OK OK OK OK CK OK 


< KKK KK KK OK 


>< 


<x X< 


ANovaalial 


Grade 


8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 


iksvaal 
‘Ci gslels) 


OODNDODOOO DOO OO O O O CW WO CO WO WC; 0 CO WMO WOO WOW WOOO WOO WOW Ww WO N 


bb 
ome, 


ltem 
Number 
47886 
47795 
55632 


47467 
53041B 
53045A 
53050B 

47319 

53049 

47687 

47960 

47397 

47643 

47653 

47663 

47699 

47956 

47976 

47327 

47329 

47333 

47968 

53134 

47329 

47333 

47968 

53134 

53049 
53037B 
53061B 
55108B 
55559B 
55627B 
55114B 
55905B 

47785 

53423 

53433 

53450 

HSo12 

46672 

53550 

53616 


CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 
CAT 


OrRPrPRPRPNHRRPRWNHRFRNNRFNNHFAPNWBWKNWWWNHNRFPRPRRA NP ABWNHNDN BF 


PILOT ANALYSIS SUMMARY OF RESULTS 


Treatment 


Guessing starting at 0.25 
Guessing starting at 0.25 
Category starting value re- 
assigned 

Guessing starting at 0.25 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Guessing fixed at 0.25 
Guessing fixed at O 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing fixed at O 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Dropped due to LID 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 
Guessing fixed at 0.10 
Guessing fixed at 0.25 
Guessing starting at 0.25 
Guessing starting at 0.25 


137 


~x< 


< KKK KK OK 


sed od OR Ad WAC | hOB) ed WAC] AO 


KX KK KK KK KR KR KOK OK OK OK OK OK KOK KOK OK OK OK KOK OK KOK KKK KKK KOK OOK OK OK CK 


PILOT ANALYSIS SUMMARY OF RESULTS 


pal ars wae Treatment 4PL/PC 2PL/GPC 3PL/GPC 
10 9 5337/2 CAT 1 Guessing starting at 0.25 X 
10 9 53423 3 CAT 1 Guessing starting at 0.25 X 
10 9 53433 CAT 1 Guessing starting at 0.25 X 
10 10 55107B PT 2 Dropped due to LID X X 
10 10 55260B PT 2 Dropped due to LID X X 
10 10 55623B PT 2 Dropped due to LID X X 
10 10 55933B PT 2 Dropped due to LID X X 
10 10 53580 CAT 2 _ Guessing starting at 0.25 X 
10 10 53614 CAT 3. Guessing starting at 0.25 X 
10 10 53616 CAT 3. Guessing starting at 0.25 X 
10 10 49332 CAT 2 _ Guessing starting at 0.25 X 
10 10 49619 = CAT 3 Guessing starting at 0.25 X 
10 11 49280 3 CAT 1 Guessing starting at 0.25 X 
10 11 49633 CAT 3. Guessing starting at 0.25 X 
11 9 A/(((7 CAT 1 Guessing starting at 0.25 X 
11 9 48167 CAT 1 Guessing starting at 0.25 X 
11 9 53372 CAT 1 Guessing starting at 0.25 X 
11 9 53431 CAT 1 Guessing starting at 0.25 X 
11 9 53433 CAT 1 Guessing starting at 0.25 X 
11 9 53455 = CAT 2 Guessing starting at 0.25 X 
11 9 53486 = CAT 3 Guessing starting at 0.25 X 
dial 10 53580 3 CAT 2 Guessing starting at 0.25 X 
11 10 53608 CAT 3 Guessing starting at 0.25 X 
11 10 49332 CAT 2 Guessing starting at 0.25 X 
11 11 55153B PT 2 Dropped due to LID X X 
11 11 55940A PT 2 Dropped due to LID X X 
11 11 49384 CAT 2 Guessing starting at 0.25 X 
11 11 49368 CAT 2 Guessing starting at 0.25 X 
11 11 49544 CAT 4 _ Guessing starting at 0.25 X 
11 11 49679 ~—s- CAT 3 Guessing starting at 0.25 X 
11 11 49723 = CAT 3 Guessing starting at 0.25 X 


138 


PILOT ANALYSIS SUMMARY OF RESULTS 


Smarter 
Balanced 


Assessment Consortium 


Table B.4. Math Items Receiving Treatment during IRT Calibration Under Different Model 
Combinations 


Admin Item Item  CAT/ 


rae weGraden hlaniber (ET Claim Treatment i ed od OMA wd AC] ORG) od WAC | LO 

3 4 43075 CAT 3. Category starting value re-assigned X 
3 3 51730 CAT 1 Guessing starting at 0.25 X 
3 3 51754 CAT 1 Guessing starting at 0.25 X 
4 5 44361 CAT 1 Category starting value re-assigned X 

4 4 43097 CAT 1 . Category starting value re-assigned X 
4 4 45957 CAT 1 Guessing starting at 0.25 X 
4 4 51682 CAT 1 Guessing starting at 0.25 X 
4 4 45959 CAT 1 Guessing starting at 0.25 X 
5 5 45909 CAT 1 Guessing starting at 0.25 X 
5 Ss) 45931 CAT 1 Guessing starting at 0.25 X 
é. o 53319 CAT 1 Guessing starting at 0.25 X 
a 5 53362 CAT 1 Guessing starting at 0.25 X 
5 5 53091 CAT 1 Guessing starting at 0.25 X 
3 5 53093 CAT 1 Guessing starting at 0.25 X 
5 5 5330/7 CAT 1 Guessing fixed at 0.25 X 
6 f 44822 CAT 1 Category 2 and 3 merged X X 
6 f 45052 CAT 3. Category starting value re-assigned X X 
6 6 43705 CAT 2. Category starting value re-assigned X 
6 6 47006 CAT 1 Guessing starting at 0.25 X 
6 v4 42983 CAT 1 _ Dropped due to poor statistics (G2 higher X 

than 1,000 and r-bis lower than .01) 

f 8 47012 CAT 1 Guessing fixed at 0.25 X 
8 9 42890 CAT 1 . Category starting value re-assigned X 

8 f 44821 CAT 1 Category starting value re-assigned X 
9 9 44951 CAT 2. Category starting value re-assigned X X 
9 10 45073 CAT 3. Category 3 and 4 merged X X 
9 10 4/7058 CAT 1 Guessing fixed at 0.10 X 
10 9 A2/6/ CAT 1. Category starting value re-assigned X 

10 11 43304 CAT 1 Category 2 and 3 merged X 

10 11 44843 CAT 1 Category starting value re-assigned X 
10 11 50069 CAT 1 Guessing fixed at 0.00 X 
41 11 44853 CAT 1 _. Category starting value re-assigned X 
11 11 50131 CAT 1 Guessing fixed at 0.25 X 
11 11 46932 CAT 1 Guessing fixed at 0.25 X 


139 


Appendix K— 2014 Field Test Report 


Page 29 of 39 


Smarter 
Balanced 


Assessment Consortium 


FIELD TEST: 


REPORT 


New Hampshire 
Vermont 
North Dakota y 


S South Dakota 
= E 


From March 25 to June 13, 2014, more 
than 4.2 million students, 16,549 schools, 
and thousands of teachers participated 

in the Smarter Balanced Field Test—the 
largest online assessment ever. 


The Field Test helped Smarter Balanced ensure - | 
that test questions are accurate and fair for all“ ‘ & 
students. The Field Test also gave students, teachers, “" + 
and schools an opportunity to experience the assessment 
under real-world conditions and prepare for administration 
in spring 2015. Because questions may be revised or dropped —_ 
after the Field Test, students did not receive scores. 


Smarter Balanced member states learned important lessons Field Test by the Numbers 


about test administration and technology readiness that will 
be used to improve the assessments next year. In addition, 
almost all member states conducted surveys of students, A 2 Mi | LL] O N 
teachers, and administrators to elicit feedback about the test 
a 


and the testing process. STU D E NTS 
Smarter Balanced is commissioning a thorough, independent 


review of this feedback and will release a complete report 1 6 b TH O U SA N D 
later this summer. However, based on state feedback to date, 
help desk calls, and other indicators, the Consortium has Z S C H O O LS 
already identified the following lessons: 
MILLION TESTS 
COMPLETED* 


The technology performed well 1 2 2 


e The test delivery platform functioned well, with limited 4.5 million tests administered with 
glitches given the volume of testing: approximately a 
184,000 simultaneous users on peak days. accessibility features 


e Software and system issues were identified and 
resolved quickly. 

e Schools tested technology and bandwidth under 
real-world conditions and identified practical steps 
to eliminate glitches, such as disabling automatic 
software updates that could disrupt student testing. 


Administered 
= Spring 2014 Field Test 


1,100 Help Desk inquiries per day 


- 8th grade student, Idaho 


*Smarter Balanced assessments consist of two parts. Each “test” reported is one of 
the two parts of the English language arts (ELA)/literacy or math assessment. 


eee eee $SmarterBalanced.org 


Maintaining test security in the age _— All students can participate in 
of social media can be a challenge = online assessments 


e Smarter Balanced worked with states to develop a e Students could access an unprecedented number of 
successful process for finding test questions posted language supports, including interactive glossaries in 10 
online by students. languages and multiple dialects, as well as full Spanish 

e District Test Coordinators, administrators, and teachers translations of the math assessment. 
focused on strengthening test security, and the numbers ¢ Students who are deaf or hard of hearing received tests 
of postings decreased dramatically. in American Sign Language, signed by recorded human 

interpreters. 


Ongoing communication is essential 


Refreshable Braille keyboards and real-time embossers 
allowed students who are blind to receive their online 


¢ Smarter Balanced provided schools with tests in Braille. 


communications materials to reach out to parents 
and respond to media inquiries about the Field 
Test. 

e After identifying areas that caused confusion in 
some schools, the Test Administration Manual and 
test system user guides were edited for clarity. 

e Smarter Balanced will continue to improve these 
documents as we collect feedback from schools. 


- Superintendent and principal, California 


Next Steps 


—EEE t= I J SmarterBalanced.org 


Appendix L— Summative Assessment Alignment Study Brief 


Page 30 of 39 


Summative Assessment Alignment Study: 
Smarter Balanced TAC Update 


Overview 


¢ Background of Alignment Project and 
Purpose of Update 


¢ Focused TAC Member Review 
¢ Next steps 


Project Background 


Goals & Deliverables, amended 
Contractor - HumRRO 
Approach 


Progress reports to TAC 
¢ July 2013, November 2013, & July 2014 


Charge to contractor from 7/17/14 meeting 


Focused TAC Member Review 
August 2014 


Objective 

Process 

Participants (Brian Gong, Joe Ryan) 

Outcomes 

— Found analysis approach innovative, commendable 
— Noted still a work in progress 

— TAC members focused on communication 


— Suggestions incorporated by HumRRO in revised 
document 


Overview of HumRRO Alignment 
Approach and Methodology 


¢ Variant of Webb alignment approach 
¢ Adapted to Smarter Balanced’s circumstances 


— Focus on test specifications as well as items/forms 
— Tuned to Smarter Balanced’s test specifications 
¢ Content specification structure 
¢ DOK specification structure 
— CAT design 
— Smarter Balanced schedule and data availability 
for operational items/“forms” 


Summary of HumRRO Study 


Common Core 
State Standards 


Item Specs/ 
Evidence smarter 
Content Statements Balanced 
Specifications Summative 


Assessments 


Legend 


Chain of evidence 
Indirect chain of evidence 


Black Circle Alignment examined via direct alignment ratings 


Red Circle Not currently proposed in study 


Example: Analysis A 


¢ Focus: Relationships between Common Core 
State Standards (CCSS) and Smarter Balanced 
Content Specifications 
— Content 
— DOK (Depth of Knowledge) 


Example: Analysis A -2 


olealanlelameela= 


State Standards 


Smarter Balanced 
Content Specs 


> Clan) liminelemeyurlaceis 
Balanced/CCSS 


(Math) 


Domain 


Cluster 


Standard 


(Math) 
Claim 


Domain 


Cluster 


Target 


Standard 


“Students can explain and apply mathematical 
concepts and carry out mathematical procedures 
with precision and fluency” 


Number and Operations/Mathematical Practices 


Number and Operations—Base Ten 


“Represent and solve problems involving multiplication and 
division” 

“Understand properties of multiplication and the 
relationship between multiplication and division” 

“Multiply and divide within 100” 


“Interpret products of whole numbers, e.g., interpret 5 x 7 as 
the total number of objects in 5 groups of 7 objects each” 
“Understand division as an unknown-factor problem’; 
Fluently multiply and divide within 100, using strategies such 
as the relationship between multiplication and division [e.g., 
knowing that 8 x 5 = 40, one knows that 40 + 5 = 8] or 
properties of operations” 


Smarter Balanced Alignment Study Update 
to TAC - 8/27/14 


Example: Smarter Balanced Content 
ae Specifications 


Target — 
designated 
major [m] Claim #1: Students can explain and apply mathematical concepts and carry out mathematical procedures with 
or precision and fluency. 

supporting Operations and Algebraic Thinking 
[a/s] Target A [m]: Represent and solve problems involving multiplication and division. (DOK 1) 
Items/tasks for this target require students to use multiplication and division within 100 to solve straightforward, one-step | Designated 
contextual word problems in situations involving equal groups, arrays, and measurement quantities such as length, liquid content 
volume, and masses/weights of objects. These problems should be of the equal-groups and arrays-situation types, but can | Standards 
include more difficult measurement quantity situations. All of these items/tasks will code straightforwardly to standard 
3.0A.3. Few of these tasks coding to this standard will make the method of solution a separate target of assessment. Other 
tasks associated with this target will probe student understanding of the meanings of multiplication and division (3.0A.1,2): 
Non-contextual tasks that explicitly ask the student to determine the unknown number in a multiplication or division 
equation relating three whole numbers (3.0A.4) will support the development of items that provide a range of difficulty 
necessary for populating an adaptive item bank (see section Understanding Assessment Targets in an Adaptive Framework, 


below, for further explication). 
Target B [m]: Understand properties of multiplication and the relationship between multiplication and division. (DOK 1) Designated DOK — 
Target C [m]: Multiply and divide within 100. (DOK 1) 
Target D [m]: Solve problems involving the four operations, and identify and explain patterns in arithmetic. (DOK 2) 

Number and Operations—Base Ten 
Target E [a/s]: Use place value understanding and properties of arithmetic to perform multi-digit arithmetic. (DOK 1) 
Target F [m]: Develop understanding of fractions as numbers. (DOK 1, 2) 

Measurement and Data 

Target G [m]: Solve problems involving measurement and estimation of intervals of time, liquid volumes, and masses of objects. (DOK 1, 2) 
Target H [a/s]: Represent and interpret data. (DOK 2) 
Target | [m]: Geometric measurement: understand concepts of area and relate area to multiplication and to addition. (DOK 2) 
Target J [a/s]: Geometric measurement: recognize perimeter as an attribute of plane figures and distinguish between linear and area measures. 
(DOK 1) 


Designated 
Clusters 


Target text 
describing 


assessment 
target 


may be multiple 


Geometry 
Target K [a/s]: Reason with shapes and their attributes. (G6GK 1, 2) 


Two-way alignment 


¢ Alignment is typically analyzed in relation to a reference. 
Two-way alignment checks the relationship using both CCSS 
and the Smarter Balanced content specifications as the 


reference. 
CCSS Content Smarter Balanced Content 


Standa ‘fications (Targets) 


1 = CCSS content standards not aligned with Smarter Balanced 
Content Specifications (Targets) 
2 = CCSS content standards and Smarter Balanced Content 


Specifications aligned with each other 
3 = Smarter Balanced Content Specifications (Targets) not aligned 
with CCSS content standards 


Analysis A— Focus & Questions 


Criterion: Content Representation 


The content representation (CR) criteria examine how 
well the content in the CCSS are represented by the 
assessment Targets. The CR investigations are focused 
on the following six questions: 


Criterion: DOK Distribution 


The DOK distribution (DD) criteria examines the 
reviewers’ DOK distribution of the targets compared 
to the DOK distribution identified in the Smarter 
Balanced content specifications. The DD investigations 
are focused on the following three questions: 


Question A.CR-1. Do the grade-level standards collectively 
reflect the content and skills required by the target? 


Question A.CR-2. Do the targets collectively reflect the content 
and skills required by the grade-level standard? 

Question A.CR-3. Do the individual grade-level standards reflect the content 
and skills required by the intended targets? 

Question A.CR-4: Do the individual targets reflect the content and skills 
required by the intended grade-level standard? 

Question A.CR-5. Does each mathematical practice reflect skills required by 
the intended target? 


Question A.CR-6. Do the reviewers agree with the intended mapping of 
targets and grade-level standards as identified in the content specifications? 


Question A.DD-1. Does the DOK distribution of the targets 
identified by the reviewers match that of the distribution 
identified in the content specifications (using the max DOK 
level)? 


Question A.DD-2. Does the DOK distribution of the targets 
identified by the reviewers match that of the distribution 
identified in the content specifications (using the each 
independent DOK level)? 


Question A.DD-3. Do the reviewers agree with the intended 
target DOK levels as identified in the content specifications? 


Example Analysis A Question and Specific 
Methodology 


Question A.CR-1. Do the grade-level standards collectively reflect the the content 
and skills required by the target? 


Analysis: Compute the mean percentage of targets that were rated holistically as (a) fully-aligned (target 
was adequately measured across all aligned grade-level standards), (b) mostly-aligned, (c) 
somewhat-aligned, and (d) small portion aligned 


Step 1. For each reviewer, compute the percentage of targets that were rated holistically as (a) 
fully-aligned, (ob) mostly-aligned, (c) somewhat-aligned, and (d) small portion aligned to the full 
set of grade-level standards 


Step 2. For each claim, compute the average percentage for each alignment rating (e.g., fully- 
aligned, mostly-aligned) across reviewers 


Available Data: Reviewers’ holistic target coverage ratings (how well the target was represented by 
all of the grade-level standards identified by reviewers as being aligned to that target (4- 


point scale) 
Holistic Target Rating 


a 
Table Example 1. Mean Percentage of Targets Small- 
at Each Holistic Rating (made-up data) Fully- Mostly- | Somewhat- | _ portion 
Grade | Claim aligned aligned aligned aligned 


8 


: 
a Y UpcaR 


Next Steps 


Application of approach to analyze alignment of 
Smarter Balanced summative assessments (via 
test specifications) Sept./Oct. 2014 

Report submitted to Smarter Balanced by Oct. 
2014 


Reviews, modifications, approval, dissemination 
by end of Dec. 2014 


Smarter Balanced responsible for completing 
additional alignment studies after operational 
items and “forms” available 


Comments by TAC 


Appendix M—_ Test Validation Worksheet 


Page 31 of 39 


el2/E]2] 2 
ee) 2/e] 2]! 21344]5 711 4t1}21374]15 Contract 
Activity Clalapaye Number Evidence 


Ak Audit of test construction Test blueprints, algorithm, item bank summaries, 

pois a a 

— HLL | s [rts seo es sis 

precision Tech Manual, SEM FT reports, simulations 

Aust admiinnsce TLL 19 lpcomorsimininton 
AI scoring research and report; report of human scoring 

jf Evaluation ofseoring = fV] | ¥ Wy te processes, reliability 

- Analysis of scaling and tL By Pilot analysis of dimensionality study and choice of IRT 

ein SEP 

En ieee 


model. 
Te Evaluation of standard Report of ALD development; Standard setting plan; 
6]setting Final report of standard setting. 
8 . 


development anc 


review; DIF analysis for pilot and field test; sampling 
and recruitment; scoring processes and monitoring; 


/ 


8 Evaluation of equitable 


particp. & access 


HONE 
INE 
{th 
ee 
hy 
| Wy 
HE 


10 Content validity and 
alignment 


_ [11 Evaluating ECD EHH th 
3 12 IRT residual analysis HR 


14,16 |standards, test balance; analysis of blueprint fidelity 


14,16 |Report of item development activities; test blueprints 


1 ws Technical report; Field test and simulation analysis 


i 
a 
i 
ee 
i 
i 
ki 


Technical report. 


oe Alignment report from contractor-item alignment to 
Cognitive lab reports from pilot and field test; Summay 
14, 16, 19|response time analysis 


15 Cognitive interviews, think- STAT. 
15faloud 

16 Decision consistency and 
eam LT AY | LS 


TTT 14,16 |Cognitive lab reports from pilot and field test 


iy WT EP EP pr Report from standard setting vendor. 


a 17 Cut-score standard errors leks V | | Why Ay] | re Report from standard setting vendor. 


18 Criterion-related validation 
18 18 Ceterionrelaed validation (il me ||| Future |Future report from longitudinal operational data. 


Analysis of pilot and field test survey data; Report of 
ee | |v 14,16, |reporting focus groups; surveys of panelists in ALD 
19) OUPS, SUTVEYS 15,23 {writing and standard setting. 
tht bL il 


20 Criterion-related validation Proposed |Proposed study with Core to College states using 
20}of readiness V V]V}Y study |external readiness measures 

21 Surveys of postsecondary Reports of surveys from regional higher ed 
21feducators representatives and state higher ed leads. 

22 Analysis of enrollment, 
: Se CO Futur eportfom ngtinalpeaioal s 


223 Teacher morale suveys CCCCRISCSTIIST LT reruns repr fom oni operon dats 


a SEE FETEEAEE EERE He = 
24lin students Future [Future report from longitudinal operational data 

amen nn 
25lasnirations survevs Future |Future report from longitudinal operational data 


26 Evaluation of vertical scale Pilot report of dimensionality and scaling structure; 


Field test report of scale construction, stability. 


27 Criterion-related studies re: 

27] eain/growth Future report from longitudinal operational data. 
28 Follow-up on specific 

28 cea a decisions Future |Future report from longitudinal operational data. 


294129 29 Sensitivity to instruction _ to instruction met SE-BEREREEEREEE LTH Pra | Future — Future report Future report from longitudinal operational data. Future report from longitudinal operational data. operational data. 


30 Analysis of classroom alee N N N N J 
30lartifacts Future |Future report from longitudinal operational data. 


ep eore ep Or unm ane |v |V Focus groupreports from scoring contractor. Collective 
af larity information from state leads and superintendents. 
SRA GUIAACRNOIAAE nDOD Cl  —————— 
32I rates Future report from longitudinal operational data 
33 Analysis of reliability of Psychometirc cOnmacior sets methods, criteria; 
; Pere CETL TT eee — 


spp Generatizabiiny suas | | [xP | LN ELE] 5 frectnicatmanal 


35 Item parameter drift Psychometric contractor establishes method; Studies to 
5, future |be conducted with ongoing operational data. 


Report from pilot and field test item development 
36 Audit of ay and | | | | vendor on item development practices and item 
Sr nSVILy Peview 14,16 |sensitiity review methods and execution. 
6, 19, 
37 Audit of test External |Report of accommodations eligibility and delivery from 
accommodations studies Jadmuinistration vendors. 
Tesees=—EP REPRE ec 
38 functioning _ DIF reports in technical report 
fad LL LE BELLE ELE LLL | I ce fer soon om oni operation a 
391 validit Future |Future report from longitudinal operational data. 
: eee 


Test differential functioning (tech report), 
= Analysis of group disaggregated reports by ethnic group, disability, ELL 
differences status 


42142 Multitrait/multimethod Bel | | Pie | | | } 5 |Tech report; part of dimensionality study = si report; part of Tech report; part of dimensionality study = study 


ao BUATACHAATCAODE i E == — 
43I curriculum surve Future |Likely to be carried out by states; 
oo Survey of algnment between teachers' instructional units 
44 Validation of content and interim block assessments. Likely carried out by 
clusters a states. 


45 Analysis of interim usage | ly 
45| statistics a Future report from longitudinal operational data. 


46 Surveys, interviews, focus 

groups of (high) users of ViVi Vv 

interim assessments Future |Future report from longitudinal operational data. 

47 Audit of formative Report from instructional resources vendorof 

resources development and development. Implementation data from operational 
a TEIN 

CETTE rae freer toni apt 
48] formative Future |Future report from longitudinal operational data 

sem TPP REECTEE CT oP oa 
49} leadership Future |Future report from longitudinal operational data. 

psosmmnanon LL TE EEL EEL LLL EL PEPER] 25 dpc tominseusins sources ender 
50lassessment survevs 23 Report from instructional resources vendor. 


51 Formative assessment user 

51 rece ssesmente TTT LT ELLE ELL ef ace Future report from longitudinal operational data. 
52 Parent, student formative 

52 ee ee Future report from longitudinal operational data. 


53 Case studies of frequent 

ofies  T LLE TTT ET TTT e191 re [rr eprom onsiatint operation! 
54 Critique of Theory of 
ue ane VIII |v || py [aif] Res 

eee Agenda |Checklist completion--ongoing 


55 Evaluation of local 
cdsorinator mrermenten — | LLL ELLE LLL LEI tomer report tom longitudinal operations data 


Summative purposes 


The purposes of the Smarter Balanced Summative assessments are to provide valid, reliable and fair information about, 
1. students’ ELA and Mathematics achievement with respect to those CCSS measured by the ELA and Mathematics summative 
1lassessments 


2. whether students prior to Grade 11 have demonstrated sufficient academic proficiency in ELA and mathematics to be on track for 
2}achieving college readiness 

3. whether Grade 11 students have sufficient academic proficiency in ELA and Mathematics to be ready to take credit-bearing college 
3} courses 
414. students’ annual progress toward college and career readiness in ELA and Mathematics 
315. how instruction can be improved at the classroom, school, district, and state level 


6. students’ ELA and Mathematics proficiencies for Federal accountability purposes and potentially for state and local accountability 
6} systems 


7|7. students’ achievement in ELA and Mathematics that is equitable for all students and subgroups of students . 


Interim Purposes 
The purposes of the Smarter Balanced Interim assessments are to provide valid, reliable and fair information about, 


1}1. student progress toward mastery of the skills measured in ELA and Mathematics by the summative assessment 
2. students’ performance at the content cluster level so teachers and administrators can track student progress throughout the year and 
2}adjust instruction accordingly 
3. individual and group (e.g., school, district) performance at the claim level in ELA and mathematics to determine whether teaching 
3}and learning are on target 
414. student progress toward the mastery of skills measured in ELA and Mathematics across all subgroups of students 


Instructional Resources Purposes 


The purposes of the Smarter Balanced Formative Assessment Resources are to provide measurement tools and resources to, 


1 1. improve teaching and learning 
2 2. monitor student progress throughout the school year 
3 3. help teachers and other educators align instruction, curricula, and assessment 
4. help teachers and other educators use the Summative and Interim assessments to improve instruction at the individual student and 
4 classroom levels 
5 5. illustrate how teachers and other educators can use assessment data to engage students in monitoring their own learning 


Appendix N-— 2013 Pilot School Districts and Charter Schools 


Page 32 of 39 


Status |Response____—'[SchoolName__ [Local Education AgencyName 
po veS ADRIAN ELEM. ADRIAN 
po VeS TS CADRIANSR.HIGH ADRIAN 
po VeS ADVANCE ELEM. —CADVANCERIV 
po VeS TT CADVANCEHIGH CADVANCERIV 
PINOT CROGERSMIDDLEAFFTON 201 
po ves ALBANY MIDDLE ALBANY RI 
VES CVIRGINIAE. GEORGE ELEM. ALBANY R-II 
po VES CALLEN VILLAGESCHOOL ALLEN VILLAGE 
Pp vesCSALTENBURGELEM. —ALTENBURG 4B 
po ves ALTON ELEM. ALTON 
pT VES—CAPPLETONCITYHIGH APPLETONCITYRN 
po VeSCCASSCO.ELEM. OO ARCHIERV 
po vesCASHGROVEELEM. CC ASHGROVER-IV 
po INOCASHGROVEHIGH ASHGROVERIV 
Pp VeSCAURORAJRHIGH OS JAURORAR-VII 
po vesAVAMIDDLE AVA 
NEW {| Avillaschool Avilla R13. School District 
Pp VES BALLARDHIGH BALLARD 
po ves BELL CITYELEM. OO BELLCITYRI 
| [Nonresponsive |BELLEVIEWELEM. | BELLEVIEWRMN 
Pp VeS CAMBRIDGE ELEM. BELTON'124 
NEW [| Hillcrest Elementary [Belton School District #124 
po VeS BERNIE ELEM. BERNIER-XIN 
po VeSBERNIEHIGH OO BERNIER-XIN 
po ves SBEVIERHIGH BEVIERC-4 
NEW | (Blair Oaks Elementary Blair OaksRN 
NEW [| Blair Oaks Middle Blair OaksRN 
pVES— BLUE SPRINGSHIGH SS BLUESPRINGSR-IV 
pT iVeS [FRESHMAN CRT.-G.BAKER BLDG. |BLUESPRINGSR-IV 
po VeS SS JOHNNOWLIN ELEM. BLUESPRINGSR-IV 
Pp VES—CBOLIVARINTERMEDIATE SCH. |BOLIVARR-2. 
Pp veSBOLIVARMIDDLEBOLIVARR-D 
po VeS SC IBOONVILLEHIGH —JBOONVILLERT 
po VES—CBRECKENRIDGEHIGH SS BRECKENRIDGER-| 
Pp VES CS IBRONAUGH ELEM. JBRONAUGH RVI 
Pp VES BROOKFIELDHIGH SS JBROOKFIELD RI 
Pp VES [BROOKSIDE CHARTER SCH. BROOKSIDE CHARTERSCH. 
| __[Nonresponsive [BunkerElem. BUNKER RI 
po ves TT CaainsvilleHigh SS CAINSVILLERS} 
PINOT Cameron Middle CAMERON 
po VesCICANTONELEM. OO CANTONR-VO 
NEW | [Carl Junction Intermediate [Carl Junction R-1 School District 
New | Carl Junction Intermediate [Carl Junction School District 
po VES CARTHAGE JR. HIGH ICARTHAGER-IX 
po VeS ST FAIRVIEWELEM. OS CARTHAGERIX 
po vesSISTEADLEYELEM. SC CARTHAGERIX 
PINOT CARTHAGE TECHNICAL CENTER-NORCARTHAGER-IX 
Pp VES CICASSVILLEINTERMEDIATE —CASSVILLER-IV 
po ves CENTER MIDDLE CENTERS 


Pp VeSCENTERSR.HIGH OS CENTER58 
NEW | [West Elementary CentralR:3 
INO CENTRALIAINTERMEDIATE CENTRALIAR-VI 
Pp VeS ST |WARRENE.HEARNESELEM. |CHARLESTONR-| 
| ___|Nonresponsive |CHARLESSTONHIGH SS |CHARLESTONR-| 
po VES TC CHILHOWEE ELEM. CHILHOWEERIV 
| _|Nonresponsive_|CITY GARDEN MONTESSORI SCHOOUCITY GARDEN MONTESSORI 
| __[Nonresponsive |CLARKCO.MIDDLE SS CLARKCO.RA 
Pp ves SIGLENRIDGE ELEM. CLAYTON, 
po VeSCCOLECO.RIMIDDLE TS COLECOwWRA 
po ves ST FAIRVIEWELEM. —|COLUMBIA3 
Pp VES FREDRICK DOUGLASSHIGH —|COLUMBIA93_ 
po veS SC JOHNRIDGEWAYELEM. —|COLUMBIA93 
po VES MARY PAXTON KEELEVELEM. —[COLUMBIA93_ 
po VeS NEWHAVEN ELEM. COLUMBIA3, 
po veSROBERTE.LEEELEM. ——ICOLUMBIAS3, 
Pp Ves RUSSELL BLVD.ELEM. SS [COLUMBIA93 
VES CIULYSSES'S.GRANTELEM. [COLUMBIA93, 
po VeS|WESTBLVD.ELEM. SC COLUMBIA3 
PINOT JUVENILEJUSTICECTR. —ICOLUMBIA93 
PINOT MILLCCREEK ELEM. CS COLUMBIAS3 
Pp VES SC ICONCORDIAELEM. SS ICONCORDIARSH 
| _|Nonresponsive_|CONSTRUCTION CAREERS CENTER CONSTRUCTION CAREERS CENTER 
| __[Nonresponsive |COOTERELEM. [COOTER R-IV 
po VeS CRANE MIDDLE CRANE 
po VeS TS CCROCKERHIGH CROCKER 
| __|Nonresponsive |CRYSTALCITYELEM. SS CRYSTALCITY47 
| ___|Nonresponsive |CRYSTALCITYHIGH CRYSTALCITY47 
po vesCDAVISELEM. OO DAVISRXI 
| __[Nonresponsive |DELLALAMBELEM | DELLALAMBELEM 
po veSATHENAELEM. DESOTO73,— 
po VeS TSS HILLMIDDLE I DEXTERRXI 
PINOT CALTERNATIVE RESOURCE CTR. [DIVISION OF YOUTH SERVICES 
po ves DIXON ELEM. DIXON 
NEW {| Dixon High School Dixon R-ISchools 
| __[Nonresponsive [DORAELEM. SC JDORARMN 
PINOT PEVELYELEM. OO JDUNKLINRV 
| ___[Nonresponsive_|HERCULANEUMHIGH SCHOOL |DUNKLINRV 
Pp Ves CEASTBUCHANANHIGH EAST BUCHANAN CO.C-2 
po VeS ALJ. MARTINELEM. EAST PRAIRIER“W 
| ___|Nonresponsive_|ELDORADO SPRINGS MIDDLE __|ELDORADOSPRINGSR-I|_ 
| __|Nonresponsive [ELDON UPPERELEM. JELDONRA 
po VeS EMINENCE ELEM. SC EMINENCER| 
po VeS TT EMINENCEHIGH OS EMINENCERT 
| ___|Nonresponsive [EXCELSIOR SPRINGS CAREER CTR. __|EXCELSIORSPRINGS40_ 
| Ves CFAIRGROVE MIDDLE FAIR. GROVE R-X 
po VeS FAIRFAX ELEM. FAIRFAX 
Pp VES SC |WASHINGTON-FRANKLIN ELEM. |FARMINGTONR-VI 


| s{yeES~~—_{AIRPORTELEM. _| FERGUSON-FLORISSANT R-II 
| —s{YES”~——S*[BERMUDAELEM. _________|FERGUSON-FLORISSANT R-II 
| YES. —S—S—SsSCROOSS KEYS MIDDLE FERGUSON-FLORISSANT R-II 
| YES~——S{HOLMANELEM. __—_—_|[FERGUSON-FLORISSANTR-I| 
| {YES”—_—*YYJOHNSONWABASHELEM. _|FERGUSON-FLORISSANTR-Il 


) ives LEE HAMILTONELEM. _——|FERGUSON-FLORISSANT RN 
) ves IMCCLUERNORTHHIGH __[FERGUSON-FLORISSANT RI 
/ ives IPARKERROAD ELEM. ——|FERGUSON-FLORISSANT RN 
) ives |WALNUTGROVEELEM. __[FERGUSON-FLORISSANT RN 
) ives FESTUS INTERMEDIATE FESTUSR-VI 
/ ives FORDLAND HIGH SCHOOL [FORDLAND RW 
) ives FORDLAND MIDDLE FORDLAND RW 
) ives FORSYTHHIGH FORSYTH RM 
ves FORSYTH MIDDLE FORSYTH RUM 
) ives BUCKNER ELEM. CC FORTOSAGER-} 
) ives FIRE PRAIRIEMIDDLE ss [FORTOSAGER-} 
ives ANTONIAELEM, FOX 
| ives CIGEORGEGUFFEVELEM. FOX) 
) ives CISECKMANHIGHSCHOOL FOXC-6 
) ives FRANCIS HOWELL MIDDLE FRANCIS HOWELL 
| ves HENDERSON ELEM. FRANCISHOWELLRM 
| ves FRANKLINCO.ELEM. FRANKLINCO.RAN 
eee 
New | Fredericktown Middle School __|Fredericktown R-ISchool District 
| __|Nonresponsive_|FRONTIER SCHOOL OF INNOVATION|FRONTIER SCHOOL OF INNOVATION 
| ves FOREST PARKELEM. FT. ZUMWALT Rl 

) ves JOSTMANNELEM. FT. ZUMWALT Rl 

) ves ROCK CREEK ELEM. FT. ZUMWALT Rl 

p NO) BARTLEVELEM. FULTON 
p NO) IMCINTIRE ELEM. FULTON 
ives GAINSVILLEHIGH SS GAINESVILLER-V 
) ves HERMANNMIDDLE SC IGASCONADECO.RA 
| ___|Nonresponsive _|GATEWAY SCIENCE ACAD/ST LOUIS [GATEWAY SCIENCE ACAD/ST LOUIS 
| ives CIGENNESISSCHOOLINC. __—GENESISSCHOOLINC. 
) ives GIDEON ELEM. CGIDEON 37 
| __|Nonresponsive |GILLIAMELEM. SS GILLIAMC-@ 
ives GILMANCITYELEM. SS IGILMANCITYRIV 
| ___|Nonresponsive |GLENWOODELEM. GLENWOOD R-VIN 
| __|Nonresponsive |GORDONPARKSELEM [GORDON PARKSELEM 
) ives MATTHEWSELEM. ——GRAINVALLEVR-V 
ives CGREENCITYELEM. ——SGREENCITYR-A 
) ives GREEN CITYHIGHSCHOOL __GREENCITYR-A 
/ ives CIGREENFORESTELEM. GREEN FORESTR- 
| ives CGREENRIDGEELEM. CS GREENRIDGER-VINN 
New | Hale Hale RA 
ives HAMILTONELEM. SS THAMILTONR 
/ ives JOAKWOOD ELEM. CHANNIBAL6O 
/ ives CVETERANS ELEM. HANNIBALGO 
) ves HARDIN-CENTRALELEM. __[HARDIN-CENTRALC-2_ 
| ives HARDIN-CENTRALHIGH [HARDIN-CENTRALC-2_ 
) ves HARRISONVILLEMIDDLE SS [HARRISONVILLER-IX 
| _|Nonresponsive |HARTVILLEELEM. SS HARTVILLERH 
) ves ICENTRALMIDDLE HAZELWOOD 
ves EAST MIDDLE HAZELWOOD 
ives ANAELEM, HAZELWOOD 
) ives CLARIMORE ELEM. HAZELWOOD 
) ives TOWNSEND ELEM. HAZELWOOD 


PF IYES.)———sSTTWILLIMAN ELEM. HAZELWOOD 


YES ——_|WINDSORHIGH 
INDEPENDENCE 30 


George Caleb Bingham Middle Independence 30 
School 


ioneer Ridge Middle School Independence 30 


Independence 30 
RON CO. C-4 
ACKSON R-II 


S 

W 

p 

Spring Branch Elementar Independence 30 
T 

W 


| 

J 

ORCHARD DRIVEELEM. _—_—_—_|JACKSON RII 
JASPTER CO. R-V 
JEFFERSONHIGH ——_| JEFFERSON C-123 
Jefferson C-123 
JEFFERSON CITY HIGH JEFFERSON CITY 
TELEGRAPH INTERMEDIATE _| JEFFERSON CO. R-VII 


Danby Rush Tower Middle School |Jefferson County R-VII School District 


YES 
C 
E 


S 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S 


YE 
YE 
YE 
YE 
YE 
YE 
YE 
YE 
YES 
YES 
YES 
YES 
YES 
YES 
YES 
YES 
YES 
YE 

YE 

YE 

YES 
YES 
YES 
YE 

YE 

YE 

YE 

YE 

YES 


NOU CONTRACT. KANSAS CITY33— 
INO CEASTHIGH SCHOOL. KANSASCITY33— 
INO [NORTHEASTHIGH SCHOOL KANSAS CITY 33 
Pp VES [DOGWOOD ELEM. KEARNEYRI 
po ves KING CITYELEM. KING CITY 
PINOT KINGSVILLEHIGH OS KINGSVILLERT 
po veS TT KIRBYVILLE MIDDLE KIRBYVILLER-VI 
PINOT KIRKSVILLE AREATECH CTR. KIRKSVILLE 
PINOT KIRKSVILLESR. HIGH KIRKSVILLE RW 
po VeS LA PLATAELEM. 0 LAPLATAR-H 
po veS LA PLATAHIGH LA PLATAR-H 


ad 

| 

NEW | 
WES 
ae | 
po VES 
a 
ee 
ee ae 
re |e 
pes 
eS | 
a | 
re as 
a | caer 
WES 
es 
ES 
acl 
NEW | 
NEW | 
NEW | 
es 
ese 
ps ES: | 
aS | 
= 2= = JES] 
NEW | 
ae | 
aS 
NEW | 
il 
po iese 
as as 
es ES 
a 
ies 
es 
Eis 
| 

a) 

Ee 


po VeSEZARDELEM. OO CLACLEDECO.RA 
po Ves LADUE MIDDLE LADUE 
po vesCSPOEDE ELEM. LADUE 
| ___[Nonresponsive |GRANDVIEWELEM. SS LAFAYETTECO.C-2 
| ___[Nonresponsive |LAFAYETTECO.MIDDLE |LAFAYETTECO.C-1 
a 
| [Nonresponsive |LAKELANDELEM. LAKELAND RM 
po Ves LAMAR MIDDLE LAMAR RA 
po VeS CC LAQUEYR-VHIGH OO LAQUEYR-VO 
Pp VES LAREDO ELEM. LAREDOR-VIN 
po VeS CEDAR CREEKELEM. SS LEE'SSUMMITR-VII 
po VeSCHAZELGROVEELEM. LEE'SSUMMITR-VIE 
po VES SS |MEADOW LANE ELEM. LEE'SSUMMITR-VIE 
po VES RICHARDSON ELEM. LEE'SSUMMITR-VIE 
po VeSSUMMITPOINTE ELEM. LEE'SSUMMITR-VIE 
PINOT HILLTOP SCHOOL LEE'SSUMMITR-VIE 
po INO LEE'SSUIMMITSR.HIGH LEE'SSUMMITR-VII 
po VeSLEETONELEM. OO LEETONR-X 
po vesHIGHLANDELEM. —LEWISCCO.C-D 
po VeS TT LIBERALHIGH LIBERAL 
po VeS LIBERTY OAKS ELEM. LIBERTY53_— 
| ___|Nonresponsive [LIBERTY MIDDLESCHOOL LIBERTY53_— 
po VeS TT LINCOLNELEM. LINCOLN 
Pp VES ROBERT H. SPERRENG MIDDLE _|LINDBERGH SCHOOLS 
po VES CC LONEJACKHIGH School LONEJACKC-6 
po VeS CC LONEDELLELEM. SS LONEDELLR-XIV 
Pp VES IMADISON ELEM. JMADISONC-3— 
po VES SS IMANSFIELDHIGH OS MANSFIELDRIV 
| ___|Nonresponsive |MRHELEMENTARY | MAPLEWOOD-RICHMOND HEIGHTS 
| __[Nonresponsive |MARCELINE MIDDLE |MARCELINERV 
Pp VES MARION C.EARLYELEM. MARION C.EARLYR-VO 
po VES MARION C.EARLYHIGH MARION C.EARLYR-VO 
PINOT IMARIONVILLEELEM. SS JMARIONVILLER-IX 
po INO IMARIONVILLEMIDDLE SS JMARIONVILLER-IX 
| __[Nonresponsive |BUEKERMIDDLE MARSHALL 
| ___[Nonresponsive _[MARSHALLSR.HIGH MARSHALL 
PINOT MARYVILLEHIGH MARYVILLE“ 
PINOT MARYVILLEMIDDLE MARYVILLE“ 
PINOT INORTHWESTTECHNICALSCH. |MARYVILLER-H 
po vesINOELELEM. MCDONALD COWRA 
| __|Nonresponsive |BEASLEYELEM. SS JMEHLVILLERIX 
VES IMISSOURICITYELEM. ss MISSOURICITYS6, 
po Ves IMOBERLYMIDDLE SS JMOBERLY, 
Ves [NORTH CENTRALREGIONAL|MOBERLY, 
| {Monett R-1 School District___—|MonettR-1Schools 
Ves CALIFORNIA ELEM. |MONITEAUCO.RA 
Ves [MONROE CITYMIDDLE Ss |MONROECITYR| 

| {Morgan County R-1 Elementary [Morgan County R-1 School district 

VES [MOUNTAINGROVEELEM. — |MOUNTAINGROVE R-II 

Ves IMT. VERNON MIDDLE |MT.VERNONRV 
PINOT INAYLOR ELEM. NAYLOR 

Ves NEELYVILLEHIGH SS NEELYVILLERIV 

Ves CENTRALELEM. SC |NEOSHOR-VO 

E 


SS 
a 
Ls 
io 
ped 
— 
a 
a 
| VES IGEORGE WASHINGTON CARVER ELEJNEOSHOR-VO 


YES 
YES 
YES 
N 
YES 
YES 
YES 
YES 
YES 


INO [NORTH DAVIESS ELEM. NORTH DAVIESSRI 
INO [MAPLE PARK MIDDLE [NORTH KANSAS CITY 74 
INO JORCHARD FARM MIDDLE JORCHARDFARMR-VO 
PARKWAY C-2 

INOW BARRYSCH. OS PLATTECO.RM 
INO [DONALD D. SIEGRISTELEM. | PLATTE CO. R-II 

pT Polo R-VIIHigh School [Polo R-Vil School District 
pf Polo R-ViI'School District__—| Polo R-Vil School District 
) ohn Evans Middle School [Potosi R-lll School Distirct_ 
| Potositlementary School [Potosi R-lll School Distirct_ 
| PotosiHigh School [Potosi R-Ill School Distirct_ 
/ Trojan intermediate School_____[Potosi R-II School Distirct_ 
| ___[Nonresponsive _|MARKTWAINSR.HIGH RALLSCO.R-I 
Pp VeS—CBLUERIDGEELEM. CS JRAYTOWNC-2 
po VeSSFLEETRIDGE ELEM. SS JRAYTOWNC-20 
| VES CREEDS SPRING INTERMEDIATE |REEDSSPRINGR-IV 
| VES CREEDS SPRING MIDDLE REEDS SPRINGR-IV 
po VeS LYON ELEMENTARY REPUBLICR-IN 
Pp VES PRICE ELEMENTARY REPUBLICR-IM 
po VeSREPUBLICHIGH OS REPUBLICR-IN 
po VeS SC REPUBLICMIDDLE SS REPUBLICR-IN 
| [Nonresponsive [RICHHILLHIGH SS RICHHILLR-IV 


pa 
Es 
| 
as 
a 
Eo 
L 
a 
| 
od 
ee! 
oe 
a 
Ed 
——— 
a 
ae 
eed 
a 
as 
a 
Ed 
a 
a 
as 
| 
a 
a 
oo 
Eo) 
sl 
as 
pa 
a 
od 
| 
oe! 
a 
eel 


| - [Nonresponsive _]RICHARDS ELEM RICHARDS R-V 
| YES. sSRICHLAND ELEM RICHLAND R-IV 
| YES. SSsSSUINRISE ELEM. RICHMOND R-XVI 


aun Ripley County R-III School District [Ripley County R-II] School District 


| __|Nonresponsive |WYLANDELEM. SS RITENOUR, 
VES CRESTVIEW MIDDLE 
VES RIDGE MEADOWS ELEM. 
VES WESTRIDGE ELEM. SS JROCKWOODRVI 
| ___|Nonresponsive _|LAFAYETTESR.HIGH |ROCKWOODRVI 
| VES 
NEW | [Salisbury High School SalisburyR-IV 
po ves SANTAFE ELEM. SANTA FE R-X 

Pp INOISANTAFEHIGH SANTA FE R-X 

Pp VeS [WILDWOOD ELEM. SARCOXIERM 
ses 
ee 
| [Nonresponsive [JOHNGLENNELEM. [SAVANNAH RN 
| __|Nonresponsive |OSAGE MIDDLE |SCHOOLOFTHEOSAGE 
po ves SCOTT CITYMIDDLE SCOTT CITYRA 
po INO THOMAS W.KELLYHIGH SCOTTCOWRWV 
pT VES [SCUOLA VITANUOVA CHARTER |SCUOLAVITANUOVA 
po VeS SC THORNERSVILLE MIDDLE |SENATH-HORNERSVILLEC-8 
| ___|Nonresponsive_|SENATH-HORNERSVILLE SR. HIGH _|SENATH-HORNERSVILLEC-8 
| ___[Nonresponsive_|SENECAINTERMEDIATE SCHOOL _|SENECAR-VIE 
po ivesCSEYMOURHIGH OO SEYMOURRME 
po veSsSHELBINA ELEM. SHELBYCO.RIV 
NEW | Clarence Elementary [Shelby County R-IV School District__ 
po Ves 7TH AND STHGRADECTR. |SIKESTONR-6 
NEW | i SilexHigh School Silex RISchool District 
po INOSSLATER HIGH SLATER 
po veSSSIMITHTON ELEM. SMITHTONR-VI 
po ves TS SIMITHVILLE UPPER ELEM. |SMITHVILLER-M 
NEW | South Callaway Rll Elementary [South Callaway R-II School District 


a South Callaway R-II High School South Callaway R-II School District 
ae | South Callaway R-II Middle School |South Callaway R-II School District 


PINOT SOUTHIRONELEM. SS |SOUTHIRONCO.RA 
| __|Nonresponsive [SOUTH PEMISCOTELEM. [SOUTH PEMISCOTCO.R-V_ 
| ___|Nonresponsive_|SOUTHERN BOONE MIDDLE [SOUTHERN BOONECO.R-} 
p VES [SOUTHWEST LIVINGSTON CO R-1 EL|SOUTHWEST LIVINGSTON CO. RI 
| __|Nonresponsive |EXTERNALSITES SPECL.SCH.DST.ST.LOUISCO. 
| __|Nonresponsive _|SPICKARDELEM. ss SPICKARD RW 
Pp VeS SPOKANE MIDDLE SPOKANER-VIE 
p ives SPRING BLUFF ELEM. SPRING BLUFFR-XV 
Pp Ves BINGHAM ELEM. SPRINGFIELDR-XI 
po vesICAMPBELLELEM. SS SPRINGFIELDR-XI 
Pp VES PLEASANT VIEW MIDDLE |SPRINGFIELDR-XII 
PINOT DELAWARE ELEM. SPRINGFIELDR-XII 
po INO HOLLAND ELEM. SPRINGFIELDR-XI 
p INO IMCGREGOR ELEM. SPRINGFIELDR-XI 
PINOT WILLIAMS ELEM. ISPRINGFIELDR-XII 


Pp NOSSO ELEM. ——C—~—~C~—CSUSPRRINGGFELD XC 
| ves CSJBODE MIDDLE —“‘CNCOCST.JOSEPH———“‘CSC*C*C*C 
| ves CSEDISON ELEM. —C—~—“‘“C‘dL'STiWOSEPHH—CC—‘“‘CSCSC*C*C 
| ves HALLELEM. = —“~*‘“C;C‘WUS.JOSEPHH = —“(tis*C*CidC 
| ves MARKTWAINELEM. ———————C*dST.JOSEPH—C—C“‘Cidz 
| ves CROBIDOUX MIDDLE ——C—C~C*‘*USTCWJOSEPHH  —C—C—‘“‘CSCSC*C 
| ves CSBUDERELEM. ~——C—C“‘“C‘dC'STiCLQISCITY, CO CCidzC 
| ves CSDUNBARANDBR. C—C‘*dUSCWLQUISSCITY. Cid 
| iveS'—CSEARLNANCE,SR.ELEM. ———s|ST.LOUISCITY, Cis 
| ves CSFARRAGUTELEM. ~————C—C~C~C‘*dLSCWLQUSSCTY CC—CidsC 
| YES CJEFFERSONELEM. ~————C—C*dSWLQUISCITY, Cis 
| CLES 
| _C‘YES 
| _—Yes 
YES 
YES 
YES 
YES 
YES 
NO 
NEW NO 
Nonresponsive 
Nonresponsive 
YES 
YES 
YES 
YES 
NO 
YES 
Nonresponsive 
YES 
YES 
YES 
YES 
NO 
YES 
YES 
NEW 
YES 
Nonresponsive 
YES 
NEW 
NEW 
NEW 
NEW 
NEW 
NEW 
NEW 
YES 
NO 
YES 
YES 
YES 


NEW 
NEW 
NEW 
NEW 
NEW 
NEW 


YES 
YES 
NO 


YES 
YES 


WEST PLAINS ELEM. 


MO SCHOOL FOR THE DEAF 

East Elementary 

North Elementary 

Orchard Hills Elementary 
outh Elementary 

Willard Intermediate Schools 

Willard Middle School 

WILLOW SPRINGS MIDDLE 
ALMA ELEM. 


Appendix O— 2014 Field Test School Districts and Charter Schools 


Page 33 of 39 


Date TestName DISTRICTNAME 
5/19/2014 2:00 HS-ELA-PT-A New ADAIR CO. R-I 
5/19/2014 2:00 HS-ELA-PT-A New ADAIR CO. R-I 
5/19/2014 2:00 Math-PT-Cell Phor ADAIR CO. R-I 
5/19/2014 2:00 SBAC-GO6-Math-N ADAIR CO. R-I 
5/19/2014 2:00 SBAC-GO6-Math-N ADAIR CO. R-I 
5/19/2014 2:00 SBAC-GO6-Math-N ADAIR CO. R-I 
5/19/2014 2:00 SBAC-GO6-Math-N ADAIR CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-Nonl ADAIR CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-Nonl ADAIR CO. R-I 
5/19/2014 2:00 ELA-PT-Archeologi ADVANCE R-IV 
5/19/2014 2:00 Math-PT-South Po ADVANCE R-IV 
5/19/2014 2:00 Math-PT-South Po ADVANCE R-IV 
5/19/2014 2:00 SBAC-GO8-ELA-No! ADVANCE R-IV 
5/19/2014 2:00 SBAC-GO8-Math-N ADVANCE R-IV 
5/19/2014 2:00 SBAC-GO8-Math-N ADVANCE R-IV 
5/19/2014 2:00 SBAC-GO8-Math-N ADVANCE R-IV 
5/19/2014 2:00 SBAC-GO8-Math-N ADVANCE R-IV 
5/19/2014 2:00 SBAC-GO8-Math-N ADVANCE R-IV 
5/19/2014 2:00 SBAC-GO8-Math-N ADVANCE R-IV 
5/19/2014 2:00 ELA-PT-Deserts AFFTON 101 
5/19/2014 2:00 ELA-PT-Deserts-A AFFTON 101 
5/19/2014 2:00 ELA-PT-Renewable AFFTON 101 
5/19/2014 2:00 SBAC-GO5-ELA-No! AFFTON 101 
5/19/2014 2:00 SBAC-GO5-ELA-Noi AFFTON 101 
5/19/2014 2:00 SBAC-GO5-ELA-Noi AFFTON 101 
5/19/2014 2:00 SBAC-GO5-ELA-Noi AFFTON 101 
5/19/2014 2:00 SBAC-GO5-ELA-No! AFFTON 101 
5/19/2014 2:00 SBAC-GO8-ELA-Noi AFFTON 101 
5/19/2014 2:00 SBAC-GO8-ELA-Noi AFFTON 101 
5/19/2014 2:00 SBAC-GO8-ELA-Noi AFFTON 101 
5/19/2014 2:00 SBAC-GO8-ELA-Noi AFFTON 101 
5/19/2014 2:00 SBAC-GO8-ELA-No! AFFTON 101 
5/19/2014 2:00 Math-PT-South Po ALBANY R-III 
5/19/2014 2:00 SBAC-GO8-Math-N ALBANY R-III 
5/19/2014 2:00 SBAC-GO8-Math-N ALBANY R-III 
5/19/2014 2:00 SBAC-GO8-Math-N ALBANY R-III 
5/19/2014 2:00 SBAC-GO8-Math-N ALBANY R-III 
5/19/2014 2:00 SBAC-GO8-Math-N ALBANY R-III 
5/19/2014 2:00 ELA-PT-Marine Ani APPLETON CITY R- 
5/19/2014 2:00 ELA-PT-Marine Ani APPLETON CITY R- 
5/19/2014 2:00 SBAC-GO5-ELA-No! APPLETON CITY R- 
5/19/2014 2:00 SBAC-GO5-ELA-Noi APPLETON CITY R- 


Opportunity 


PRPrPP PPP PPP PPP PRP PRP PRP RP PRP RP RP RP RP RP RP RP PRP RP RP RPP PRP PP PRP PR 


TotalStudent 


29 


NO NaN FP DD We 


TotalStudentStarte TotalStudentComrg PercentStarted 


21 
0 
17 
4 


NO 
0d 


PFPUuOnnaraAn WI © 


32 


OO OON ON OD DY 


21 
0 
17 
4 


NO 
oO 


FPFPuannrAn UW © 


32 


OOO ON ON OD DY 


91.30% 
0.00% 
100.00% 
100.00% 
100.00% 
75.00% 
100.00% 
95.65% 
0.00% 
0.00% 
0.00% 
96.55% 
0.00% 
100.00% 
100.00% 
100.00% 
85.71% 
100.00% 
100.00% 
50.00% 
93.33% 
96.13% 
95.56% 
95.65% 
50.00% 
86.67% 
95.45% 
91.67% 
97.22% 
97.22% 
94.44% 
91.89% 
91.43% 
100.00% 
100.00% 
87.50% 
75.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 


PercentCompleted 
91.30% 
0.00% 
100.00% 
100.00% 
100.00% 
75.00% 
100.00% 
73.91% 
0.00% 
0.00% 
0.00% 
96.55% 
0.00% 
100.00% 
100.00% 
100.00% 
85.71% 
100.00% 
100.00% 
50.00% 
88.89% 
95.58% 
51.11% 
45.65% 
50.00% 
57.78% 
65.91% 
91.67% 
94.44% 
97.22% 
94.44% 
89.19% 
91.43% 
100.00% 
100.00% 
87.50% 
75.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 


5/19/2014 2:00 SBAC-GO5-ELA-Noi APPLETON CITY R- 
5/19/2014 2:00 SBAC-GO5-ELA-Noi APPLETON CITY R- 
5/19/2014 2:00 SBAC-GO5-ELA-No! APPLETON CITY R- 
5/19/2014 2:00 ELA-PT-Archeologi ARCADIA VALLEY F 
5/19/2014 2:00 SBAC-GO8-ELA-Noi ARCADIA VALLEY F 
5/19/2014 2:00 SBAC-GO8-ELA-No! ARCADIA VALLEY F 
5/19/2014 2:00 SBAC-GO8-ELA-No! ARCADIA VALLEY F 
5/19/2014 2:00 SBAC-GO8-ELA-Noi ARCADIA VALLEY F 
5/19/2014 2:00 SBAC-GO8-ELA-Noi ARCADIA VALLEY F 
5/19/2014 2:00 ELA-PT-Archeologi ATLANTA C-3 
5/19/2014 2:00 SBAC-GO8-ELA-No! ATLANTA C-3 
5/19/2014 2:00 SBAC-GO8-ELA-Noi ATLANTA C-3 
5/19/2014 2:00 ELA-PT-Uncommo! AVILLA R-XIII 
5/19/2014 2:00 SBAC-GO4-ELA-No! AVILLA R-XIII 
5/19/2014 2:00 SBAC-GO4-ELA-No! AVILLA R-XIII 
5/19/2014 2:00 SBAC-GO4-ELA-No! AVILLA R-XIII 
5/19/2014 2:00 SBAC-GO4-ELA-No! AVILLA R-XIII 
5/19/2014 2:00 SBAC-GO4-ELA-No! AVILLA R-XIII 
5/19/2014 2:00 ELA-PT-Technolog\ BAKERSFIELD R-IV 
5/19/2014 2:00 Math-PT-Donuts BAKERSFIELD R-IV 
5/19/2014 2:00 SBAC-GO7-ELA-Noi BAKERSFIELD R-IV 
5/19/2014 2:00 SBAC-GO7-Math-N BAKERSFIELD R-IV 
5/19/2014 2:00 HS-Math-PT-Great BAYLESS 
5/19/2014 2:00 SBAC-HS-Math-No BAYLESS 
5/19/2014 2:00 ELA-PT-Land Form BELL CITY R-II 
5/19/2014 2:00 HS-Math-PT-Great BELL CITY R-II 
5/19/2014 2:00 SBAC-GO3-ELA-No! BELL CITY R-II 
5/19/2014 2:00 SBAC-HS-Math-No BELL CITY R-II 
5/19/2014 2:00 ELA-PT-Renewable BELTON 124 
5/19/2014 2:00 ELA-PT-Renewable BELTON 124 
5/19/2014 2:00 Math-PT-Commun BELTON 124 
5/19/2014 2:00 Math-PT-Order Fo BELTON 124 
5/19/2014 2:00 Math-PT-Turtle Ha BELTON 124 
5/19/2014 2:00 SBAC-GO4-Math-N BELTON 124 
5/19/2014 2:00 SBAC-GO4-Math-N BELTON 124 
5/19/2014 2:00 SBAC-GO4-Math-N BELTON 124 
5/19/2014 2:00 SBAC-GO4-Math-N BELTON 124 
5/19/2014 2:00 SBAC-GO04-Math-N BELTON 124 
5/19/2014 2:00 SBAC-GO7-ELA-Noi BELTON 124 
5/19/2014 2:00 SBAC-GO7-ELA-Noi BELTON 124 
5/19/2014 2:00 SBAC-GO7-ELA-Noi BELTON 124 
5/19/2014 2:00 SBAC-GO7-ELA-Noi BELTON 124 
5/19/2014 2:00 SBAC-GO7-ELA-Noi BELTON 124 


PRP PPP RPP PPP PRP RP PRP RP PRP PRP RP RP PRP RP PRP RPP RP PRP RP RP PRP RP PRP RP RP RP PR 


oO O 


21 


OOo OW fu fp UH 


22 


Oo oO O 


21 


OOo Oo WwW fF uP UM 


22 


0.00% 
0.00% 
0.00% 
89.39% 
92.31% 
85.71% 
84.62% 
92.86% 
91.67% 
0.00% 
100.00% 
100.00% 
95.45% 
100.00% 
100.00% 
100.00% 
100.00% 
75.00% 
0.00% 
0.00% 
0.00% 
91.67% 
68.80% 
0.00% 
0.00% 
87.50% 
86.96% 
87.50% 
0.00% 
95.34% 
93.75% 
87.13% 
85.07% 
91.49% 
89.13% 
93.62% 
93.48% 
97.83% 
97.67% 
98.84% 
0.00% 
95.35% 
97.65% 


0.00% 
0.00% 
0.00% 
89.39% 
92.31% 
85.71% 
84.62% 
92.86% 
91.67% 
0.00% 
93.75% 
100.00% 
95.45% 
100.00% 
100.00% 
100.00% 
100.00% 
75.00% 
0.00% 
0.00% 
0.00% 
91.67% 
61.60% 
0.00% 
0.00% 
87.50% 
86.96% 
81.25% 
0.00% 
94.46% 
93.75% 
87.13% 
85.07% 
89.36% 
89.13% 
93.62% 
93.48% 
97.83% 
95.35% 
96.51% 
0.00% 
95.35% 
95.29% 


5/19/2014 2:00 HS-Math-PT-Great BERNIE R-XIII 
5/19/2014 2:00 SBAC-HS-Math-No BERNIE R-XIII 
5/19/2014 2:00 HS-Math-PT-Great BEVIER C-4 
5/19/2014 2:00 SBAC-HS-Math-No BEVIER C-4 
5/19/2014 2:00 HS-ELA-PT-A New BISMARCK R-V 
5/19/2014 2:00 HS-ELA-PT-A New BISMARCK R-V 
5/19/2014 2:00 HS-Math-PT-Great BISMARCK R-V 
5/19/2014 2:00 SBAC-HS-ELA-Nonl BISMARCK R-V 
5/19/2014 2:00 SBAC-HS-ELA-Nonl BISMARCK R-V 
5/19/2014 2:00 SBAC-HS-ELA-Nonl BISMARCK R-V 
5/19/2014 2:00 SBAC-HS-Math-No BISMARCK R-V 
5/19/2014 2:00 HS-Math-PT-Great BLUE EYE R-V 
5/19/2014 2:00 SBAC-HS-Math-No BLUE EYE R-V 
5/19/2014 2:00 ELA-PT-Animals W BLUE SPRINGS R-I\ 
5/19/2014 2:00 ELA-PT-Deserts-A BLUE SPRINGS R-I\ 
5/19/2014 2:00 ELA-PT-Technolog\ BLUE SPRINGS R-I\ 
5/19/2014 2:00 ELA-PT-The Americ BLUE SPRINGS R-I\ 
5/19/2014 2:00 ELA-PT-Trees-A | BLUE SPRINGS R-I\ 
5/19/2014 2:00 HS-Math-PT-Great BLUE SPRINGS R-I\ 
5/19/2014 2:00 Math-PT-Order Fo BLUE SPRINGS R-I\ 
5/19/2014 2:00 Math-PT-Sandbox- BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO3-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO3-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO3-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO3-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO4-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO4-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO4-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO4-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO4-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-G04-Math-N BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO04-Math-N BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-G04-Math-N BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO04-Math-N BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO04-Math-N BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO5-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO5-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO5-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO5-ELA-No! BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO5-Math-N BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO5-Math-N BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO5-Math-N BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO5-Math-N BLUE SPRINGS R-I\ 


PRP PrP PrP PrP PPP PRP RP PRP RP RPP PRP RP RP RP RP RP RPP PRP RP RP PRP RP RP RRP RP RP PRP PR 


238 


601 
147 


ae 
nO OO 


OOOO OOO oO Oo 


68 


a 
nO O 


OOOO OOO oO Oo 


67 


61.29% 
61.29% 
77.18% 
88.89% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
94.44% 
77.11% 
86.55% 
95.00% 
68.06% 
0.00% 
89.12% 
89.41% 
65.22% 
72.00% 
70.83% 
94.44% 
93.75% 
82.35% 
87.50% 
64.71% 
64.71% 
93.33% 
86.21% 
90.00% 
86.21% 
96.55% 
100.00% 
85.00% 
95.00% 
100.00% 
81.82% 
85.71% 
90.48% 
90.48% 


61.29% 
61.29% 
77.18% 
88.89% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 
93.06% 
75.90% 
85.71% 
93.75% 
66.67% 

0.00% 
88.44% 
89.41% 
60.87% 
72.00% 
70.83% 
12.50% 
75.00% 
82.35% 
81.25% 
58.82% 
52.94% 
90.00% 
79.31% 
86.67% 
82.76% 
89.66% 
80.00% 
65.00% 
90.00% 
80.00% 
81.82% 
85.71% 
85.71% 
90.48% 


5/19/2014 2:00 SBAC-GO6-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO6-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO6-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-GO6-ELA-Noi BLUE SPRINGS R-I\ 
5/19/2014 2:00 SBAC-HS-Math-No BLUE SPRINGS R-I\ 
5/19/2014 2:00 ELA-PT-Growth an BOLIVAR R-I 
5/19/2014 2:00 Math-PT-Donuts-A BOLIVAR R-I 
5/19/2014 2:00 Math-PT-Talent Sh BOLIVAR R-I 
5/19/2014 2:00 SBAC-GO6-ELA-No! BOLIVAR R-I 
5/19/2014 2:00 SBAC-GO6-Math-N BOLIVAR R-I 
5/19/2014 2:00 SBAC-GO6-Math-N BOLIVAR R-I 
5/19/2014 2:00 SBAC-GO6-Math-N BOLIVAR R-I 
5/19/2014 2:00 SBAC-GO6-Math-N BOLIVAR R-I 
5/19/2014 2:00 SBAC-GO6-Math-N BOLIVAR R-I 
5/19/2014 2:00 HS-ELA-PT-A New BOWLING GREEN | 
5/19/2014 2:00 HS-ELA-PT-A New BOWLING GREEN | 
5/19/2014 2:00 Math-PT-Sandbox- BOWLING GREEN | 
5/19/2014 2:00 SBAC-GO5-Math-N BOWLING GREEN | 
5/19/2014 2:00 SBAC-GO5-Math-N BOWLING GREEN | 
5/19/2014 2:00 SBAC-GO5-Math-N BOWLING GREEN | 
5/19/2014 2:00 SBAC-GO5-Math-N BOWLING GREEN | 
5/19/2014 2:00 SBAC-HS-ELA-NonI BOWLING GREEN | 
5/19/2014 2:00 SBAC-HS-ELA-NonI BOWLING GREEN | 
5/19/2014 2:00 SBAC-HS-ELA-Non!| BOWLING GREEN | 
5/19/2014 2:00 Math-PT-Donuts BRADLEYVILLE R-I 
5/19/2014 2:00 SBAC-GO7-Math-N BRADLEYVILLE R-I 
5/19/2014 2:00 ELA-PT-Importanct BRANSON R-IV 
5/19/2014 2:00 SBAC-GO6-ELA-No! BRANSON R-IV 
5/19/2014 2:00 SBAC-GO6-ELA-No! BRANSON R-IV 
5/19/2014 2:00 SBAC-GO6-ELA-No! BRANSON R-IV 
5/19/2014 2:00 SBAC-GO6-ELA-No! BRANSON R-IV 
5/19/2014 2:00 ELA-PT-Trees-A BRAYMER C-4 
5/19/2014 2:00 Math-PT-Baseball- BRAYMER C-4 
5/19/2014 2:00 SBAC-G04-ELA-No! BRAYMER C-4 
5/19/2014 2:00 SBAC-G04-ELA-No: BRAYMER C-4 
5/19/2014 2:00 SBAC-GO4-ELA-No! BRAYMER C-4 
5/19/2014 2:00 SBAC-GO4-ELA-No! BRAYMER C-4 
5/19/2014 2:00 SBAC-G04-ELA-No! BRAYMER C-4 
5/19/2014 2:00 SBAC-GO8-Math-N BRAYMER C-4 
5/19/2014 2:00 SBAC-GO8-Math-N BRAYMER C-4 
5/19/2014 2:00 SBAC-GO8-Math-N BRAYMER C-4 
5/19/2014 2:00 HS-ELA-PT-A New BRECKENRIDGE R-| 
5/19/2014 2:00 HS-Math-PT-Great BRECKENRIDGE R-! 


PRPrPPPPRPRPPRPPP PRP RPP RPP PRP PPP PRP RPP PRP RP RP RP PRP PRP RP PPP PPP 


21 


NOK WOR RRP SAD 


21 


NOK WORK REAR 


81.36% 
81.67% 
88.14% 
93.33% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
84.00% 
83.02% 
100.00% 
0.00% 
93.75% 
95.00% 
95.00% 
100.00% 
95.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
97.78% 
97.78% 
97.78% 
100.00% 
97.78% 
95.00% 
91.30% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
88.89% 
90.00% 
100.00% 
0.00% 
100.00% 


74.58% 
73.33% 
81.36% 
88.33% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
84.00% 
83.02% 
100.00% 
0.00% 
92.50% 
95.00% 
95.00% 
100.00% 
95.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
97.78% 
95.56% 
95.56% 
100.00% 
97.78% 
95.00% 
91.30% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
88.89% 
90.00% 
100.00% 
0.00% 
100.00% 


5/19/2014 2:00 SBAC-HS-ELA-NonI BRECKENRIDGE R- 
5/19/2014 2:00 SBAC-HS-Math-No BRECKENRIDGE R- 
5/19/2014 2:00 Math-PT-Donuts-A BROOKFIELD R-III 
5/19/2014 2:00 SBAC-GO6-Math-N BROOKFIELD R-III 
5/19/2014 2:00 SBAC-GO6-Math-N BROOKFIELD R-III 
5/19/2014 2:00 SBAC-GO6-Math-N BROOKFIELD R-III 
5/19/2014 2:00 SBAC-GO6-Math-N BROOKFIELD R-III 
5/19/2014 2:00 ELA-PT-Uncommo! BRUNSWICK R-II 
5/19/2014 2:00 Math-PT-Animal Jt BRUNSWICK R-II 
5/19/2014 2:00 SBAC-GO4-ELA-Noi BRUNSWICK R-II 
5/19/2014 2:00 SBAC-GO4-Math-N BRUNSWICK R-II 
5/19/2014 2:00 HS-Math-PT-Great BUCHANAN CO. R- 
5/19/2014 2:00 SBAC-HS-Math-No BUCHANAN CO. R- 
5/19/2014 2:00 ELA-PT-Archeologi BUTLER R-V 
5/19/2014 2:00 Math-PT-Turtle Ha BUTLER R-V 
5/19/2014 2:00 Math-PT-Turtle Ha BUTLER R-V 
5/19/2014 2:00 SBAC-GO4-Math-N BUTLER R-V 
5/19/2014 2:00 SBAC-G04-Math-N BUTLER R-V 
5/19/2014 2:00 SBAC-GO4-Math-N BUTLER R-V 
5/19/2014 2:00 SBAC-GO4-Math-N BUTLER R-V 
5/19/2014 2:00 SBAC-GO4-Math-N BUTLER R-V 
5/19/2014 2:00 SBAC-GO4-Math-N BUTLER R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi BUTLER R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi BUTLER R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi BUTLER R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi BUTLER R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi BUTLER R-V 
5/19/2014 2:00 Math-PT-Donuts-A CAINSVILLE R-I 
5/19/2014 2:00 SBAC-GO6-Math-N CAINSVILLE R-I 
5/19/2014 2:00 SBAC-GO6-Math-N CAINSVILLE R-I 
5/19/2014 2:00 SBAC-GO6-Math-N CAINSVILLE R-I 
5/19/2014 2:00 SBAC-GO6-Math-N CAINSVILLE R-I 
5/19/2014 2:00 ELA-PT-Uncommoi CALLAO C-8 
5/19/2014 2:00 ELA-PT-Uncommoi CALLAO C-8 
5/19/2014 2:00 Math-PT-Animal Jt CALLAO C-8 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CALLAO C-8 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CALLAO C-8 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CALLAO C-8 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CALLAO C-8 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CALLAO C-8 
5/19/2014 2:00 SBAC-GO4-ELA-No! CALLAO C-8 
5/19/2014 2:00 SBAC-G04-Math-N CALLAO C-8 
5/19/2014 2:00 HS-ELA-PT-A New CAMDENTON R-III 


PRPRPRPRPPRPPRPRP RPP PP PRP RPP RPP RPP PPP PRP RP RPP HPP RP RPP RPP RPP PP PP 


17 


RPrRFNrRPFrRPFNPrRFN RP WNN DN WO 


250 


13 


OORrNFPRPFNRPRFNPWNNN O W UI NA 


11 


OOOrRrFrRrRFrRPrRPFrRFPFNPWNNN © OOO N OO MN 


0.00% 
100.00% 
97.26% 
100.00% 
100.00% 
100.00% 
89.47% 
0.00% 
94.12% 
0.00% 
100.00% 
0.00% 
64.71% 
25.88% 
0.00% 
1.90% 
95.24% 
95.24% 
50.00% 
90.48% 
85.71% 
95.24% 
64.71% 
76.47% 
41.18% 
29.41% 
17.65% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 


0.00% 
100.00% 
97.26% 
100.00% 
100.00% 
100.00% 
89.47% 
0.00% 
94.12% 
0.00% 
100.00% 
0.00% 
64.71% 
1.18% 
0.00% 
0.95% 
61.90% 
38.10% 
0.00% 
57.14% 
52.38% 
33.33% 
0.00% 
11.76% 
0.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
50.00% 
100.00% 
100.00% 
50.00% 
0.00% 
0.00% 
0.00% 


5/19/2014 2:00 Math-PT-Making S CAMDENTON R-III 
5/19/2014 2:00 SBAC-GO3-Math-N CAMDENTON R-III 
5/19/2014 2:00 SBAC-HS-ELA-Nonl CAMDENTON R-III 
5/19/2014 2:00 SBAC-HS-ELA-Nonl CAMDENTON R-III 
5/19/2014 2:00 Math-PT-Talent Sh CAMERON R-I 
5/19/2014 2:00 SBAC-GO6-Math-N CAMERON R-I 
5/19/2014 2:00 SBAC-GO6-Math-N CAMERON R-I 
5/19/2014 2:00 Math-PT-Commun CANTON R-V 
5/19/2014 2:00 SBAC-GO4-Math-N CANTON R-V 
5/19/2014 2:00 SBAC-GO4-Math-N CANTON R-V 
5/19/2014 2:00 SBAC-GO4-Math-N CANTON R-V 
5/19/2014 2:00 SBAC-GO4-Math-N CANTON R-V 
5/19/2014 2:00 SBAC-GO4-Math-N CANTON R-V 
5/19/2014 2:00 ELA-PT-Animals W CAPE GIRARDEAU 
5/19/2014 2:00 ELA-PT-Animals W CAPE GIRARDEAU 
5/19/2014 2:00 SBAC-GO3-ELA-Noi CAPE GIRARDEAU 
5/19/2014 2:00 SBAC-GO3-ELA-Noi CAPE GIRARDEAU 
5/19/2014 2:00 SBAC-GO3-ELA-Noi CAPE GIRARDEAU 
5/19/2014 2:00 SBAC-GO3-ELA-Noi CAPE GIRARDEAU 


5/19/2014 2:00 HS-Math-PT-Great CARL JUNCTION R- 
5/19/2014 2:00 HS-Math-PT-Great CARL JUNCTION R- 
5/19/2014 2:00 SBAC-GO8-Math-N CARL JUNCTION R- 
5/19/2014 2:00 SBAC-GO8-Math-N CARL JUNCTION R- 
5/19/2014 2:00 SBAC-GO8-Math-N CARL JUNCTION R- 
5/19/2014 2:00 SBAC-GO8-Math-N CARL JUNCTION R- 
5/19/2014 2:00 SBAC-GO8-Math-N CARL JUNCTION R- 
5/19/2014 2:00 SBAC-GO8-Math-N CARL JUNCTION R- 


5/19/2014 2:00 ELA-PT-Archeologi CARROLLTON R-VI 
5/19/2014 2:00 Math-PT-Turtle Ha CARROLLTON R-VI 
5/19/2014 2:00 Math-PT-Turtle Ha CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO4-Math-N CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO4-Math-N CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO4-Math-N CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO4-Math-N CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO4-Math-N CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO4-Math-N CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-Noi CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-Noi CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-No! CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-No: CARROLLTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-No! CARROLLTON R-VI 
5/19/2014 2:00 ELA-PT-Animals W CARTHAGE R-IX 

5/19/2014 2:00 ELA-PT-Animals W CARTHAGE R-IX 


PRP PrP PRP P PRP RP RPP RPP PRP RP RP RP PRP RPP RP RP RP RP RP RP RP RRP PRP RP RP RP RP RP RP PB 


250 
250 
140 
110 
125 
62 
63 
35 


NN NN N N 


230 


NY DIN DW WO 


230 


NY DAN WD WO 


92.00% 
0.40% 
0.00% 
0.00% 
76.80% 
74.19% 
79.37% 
88.57% 
85.71% 
71.43% 
85.71% 
100.00% 
85.71% 
100.00% 
88.52% 
95.00% 
100.00% 
95.24% 
100.00% 
100.00% 
93.54% 
100.00% 
100.00% 
100.00% 
20.93% 
23.08% 
32.69% 
98.48% 
100.00% 
80.95% 
91.67% 
84.62% 
100.00% 
100.00% 
92.31% 
53.85% 
100.00% 
92.31% 
100.00% 
100.00% 
100.00% 
100.00% 

87.50% 


92.00% 
0.40% 
0.00% 
0.00% 
74.40% 
51.61% 
53.97% 
88.57% 
85.71% 
71.43% 
85.71% 
100.00% 
85.71% 
100.00% 
88.52% 
90.00% 
95.00% 
85.71% 
100.00% 
100.00% 
93.54% 
100.00% 
100.00% 
100.00% 
20.93% 
23.08% 
32.69% 
96.97% 
100.00% 
80.95% 
91.67% 
84.62% 
100.00% 
100.00% 
92.31% 
53.85% 
100.00% 
92.31% 
100.00% 
100.00% 
100.00% 
100.00% 

87.50% 


5/19/2014 2:00 Math-PT-Animal Jt CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO3-ELA-Noi CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO3-ELA-Noi CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO3-ELA-Noi CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO3-ELA-Noi CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO4-Math-N CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO4-Math-N CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO4-Math-N CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO4-Math-N CARTHAGE R-IX 
5/19/2014 2:00 SBAC-GO4-Math-N CARTHAGE R-IX 
5/19/2014 2:00 Math-PT-Cell Phor CASSVILLE R-IV 
5/19/2014 2:00 SBAC-GO7-Math-N CASSVILLE R-IV 
5/19/2014 2:00 SBAC-GO7-Math-N CASSVILLE R-IV 
5/19/2014 2:00 SBAC-GO7-Math-N CASSVILLE R-IV 
5/19/2014 2:00 SBAC-GO7-Math-N CASSVILLE R-IV 
5/19/2014 2:00 Math-PT-Order Fo CENTER 58 
5/19/2014 2:00 Math-PT-Talent Sh CENTER 58 
5/19/2014 2:00 SBAC-GO3-Math-N CENTER 58 
5/19/2014 2:00 SBAC-GO5-Math-N CENTER 58 
5/19/2014 2:00 SBAC-GO5-Math-N CENTER 58 
5/19/2014 2:00 SBAC-GO5-Math-N CENTER 58 
5/19/2014 2:00 SBAC-GO5-Math-N CENTER 58 
5/19/2014 2:00 HS-ELA-PT-A New CHAFFEE R-II 
5/19/2014 2:00 HS-ELA-PT-A New CHAFFEE R-II 
5/19/2014 2:00 Math-PT-Turtle Ha CHAFFEE R-II 
5/19/2014 2:00 SBAC-GO4-Math-N CHAFFEE R-II 
5/19/2014 2:00 SBAC-GO4-Math-N CHAFFEE R-II 
5/19/2014 2:00 SBAC-GO4-Math-N CHAFFEE R-II 
5/19/2014 2:00 SBAC-GO04-Math-N CHAFFEE R-II 
5/19/2014 2:00 SBAC-GO4-Math-N CHAFFEE R-II 
5/19/2014 2:00 SBAC-HS-ELA-NonI CHAFFEE R-II 
5/19/2014 2:00 SBAC-HS-ELA-NonI CHAFFEE R-II 
5/19/2014 2:00 SBAC-HS-ELA-NonI CHAFFEE R-II 
5/19/2014 2:00 ELA-PT-Land Form CHILLICOTHE R-II 
5/19/2014 2:00 Math-PT-Talent Sh CHILLICOTHE R-II 
5/19/2014 2:00 Math-PT-Talent Sh CHILLICOTHE R-II 
5/19/2014 2:00 SBAC-GO3-ELA-No! CHILLICOTHE R-II 
5/19/2014 2:00 SBAC-GO6-Math-N CHILLICOTHE R-II 
5/19/2014 2:00 SBAC-GO6-Math-N CHILLICOTHE R-II 
5/19/2014 2:00 SBAC-GO6-Math-N CHILLICOTHE R-II 
5/19/2014 2:00 SBAC-GO6-Math-N CHILLICOTHE R-II 
5/19/2014 2:00 SBAC-GO6-Math-N CHILLICOTHE R-II 
5/19/2014 2:00 ELA-PT-Archeologi CITY GARDEN MOI 


PRP RP PrP RP PRP RPP PRP RPP RP PRP PRP RP RP RP RPP RPP PRP RP RP RP RP RP PRP RP RP RP RP RP RP PR 


157 


101 


93.55% 
100.00% 
90.91% 
100.00% 
100.00% 
94.12% 
94.74% 
100.00% 
84.21% 
94.74% 
0.00% 
100.00% 
100.00% 
68.75% 
74.47% 
42.71% 
35.64% 
42.71% 
32.00% 
36.00% 
40.00% 
34.62% 
83.87% 
0.00% 
91.30% 
100.00% 
88.89% 
100.00% 
66.67% 
100.00% 
96.77% 
0.00% 
0.00% 
71.43% 
100.00% 
88.89% 
91.16% 
100.00% 
100.00% 
0.00% 
83.78% 
85.00% 
0.00% 


93.55% 
100.00% 
90.91% 
100.00% 
100.00% 
94.12% 
94.74% 
100.00% 
84.21% 
94.74% 
0.00% 
36.67% 
28.13% 
14.58% 
25.53% 
41.67% 
35.64% 
36.46% 
32.00% 
32.00% 
36.00% 
30.77% 
83.87% 
0.00% 
91.30% 
100.00% 
88.89% 
100.00% 
66.67% 
100.00% 
96.77% 
0.00% 
0.00% 
70.07% 
100.00% 
88.89% 
83.67% 
100.00% 
100.00% 
0.00% 
83.78% 
85.00% 
0.00% 


5/19/2014 2:00 ELA-PT-The Americ CITY GARDEN MOI 
5/19/2014 2:00 Math-PT-South Po CITY GARDEN MOI 
5/19/2014 2:00 SBAC-GO5-ELA-Noi CITY GARDEN MOI 
5/19/2014 2:00 SBAC-GO5-ELA-Noi CITY GARDEN MOI 
5/19/2014 2:00 SBAC-GO5-ELA-Noi CITY GARDEN MOI 
5/19/2014 2:00 SBAC-GO5-ELA-Noi CITY GARDEN MOI 
5/19/2014 2:00 SBAC-GO8-ELA-Noi CITY GARDEN MOI 
5/19/2014 2:00 SBAC-GO8-Math-N CITY GARDEN MOI 
5/19/2014 2:00 ELA-PT-Uncommo! CLARK CO. R-I 
5/19/2014 2:00 HS-ELA-PT-A New CLARK CO. R-I 
5/19/2014 2:00 HS-ELA-PT-A New CLARK CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CLARK CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CLARK CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CLARK CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CLARK CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi CLARK CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-Nonl CLARK CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-Nonl CLARK CO. R-I 
5/19/2014 2:00 HS-Math-PT-Great CLARKTON C-4 
5/19/2014 2:00 SBAC-HS-Math-No CLARKTON C-4 
5/19/2014 2:00 HS-Math-PT-Great CLEARWATER R-I 
5/19/2014 2:00 SBAC-HS-Math-No CLEARWATER R-I 
5/19/2014 2:00 ELA-PT-Growth an CLINTON CO. R-III 
5/19/2014 2:00 SBAC-GO5-ELA-Noi CLINTON CO. R-III 
5/19/2014 2:00 SBAC-GO5-ELA-Noi CLINTON CO. R-III 
5/19/2014 2:00 SBAC-GO5-ELA-Noi CLINTON CO. R-III 
5/19/2014 2:00 SBAC-GO5-ELA-Noi CLINTON CO. R-III 


5/19/2014 2:00 HS-ELA-PT-A New COLE CO. 
5/19/2014 2:00 HS-ELA-PT-A New COLE CO. 
5/19/2014 2:00 HS-Math-PT-Great COLE CO. 
5/19/2014 2:00 SBAC-HS-ELA-NonI COLE CO. 
5/19/2014 2:00 SBAC-HS-ELA-NonI COLE CO. 
5/19/2014 2:00 SBAC-HS-ELA-NonI COLE CO. 
5/19/2014 2:00 SBAC-HS-Math-No COLE CO. 


R-V 
R-V 
R-V 
R-V 
R-V 
R-V 
R-V 


5/19/2014 2:00 ELA-PT-Marine Ani COLUMBIA 93 
5/19/2014 2:00 ELA-PT-Uncommo: COLUMBIA 93 
5/19/2014 2:00 HS-Math-PT-Great COLUMBIA 93 
5/19/2014 2:00 Math-PT-Animal Jt COLUMBIA 93 
5/19/2014 2:00 Math-PT-Donuts COLUMBIA 93 
5/19/2014 2:00 Math-PT-Donuts-A COLUMBIA 93 
5/19/2014 2:00 Math-PT-Turtle Ha COLUMBIA 93 
5/19/2014 2:00 SBAC-GO3-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-GO3-Math-N COLUMBIA 93 


PRPRPrPPRPRPRPP RPP RP RPP RPP PRP RP RP RP PRP RPP RP RP RP RP RP RP RP RRP RP RP RP RP RP RP RP RP RB 


42 


en ee ee eee ee) 


40 


ee ee en ee eee 


100.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
92.86% 
93.88% 
100.00% 
100.00% 
100.00% 
66.67% 
100.00% 
100.00% 
93.88% 
100.00% 
95.45% 
95.45% 
95.31% 
96.88% 
93.88% 
100.00% 
83.33% 
100.00% 
92.31% 
92.86% 
0.00% 
2.33% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 


100.00% 
0.00% 
100.00% 
83.33% 
100.00% 
100.00% 
0.00% 
0.00% 
92.86% 
93.88% 
100.00% 
100.00% 
100.00% 
66.67% 
100.00% 
100.00% 
91.84% 
100.00% 
95.45% 
95.45% 
95.31% 
96.88% 
91.84% 
100.00% 
83.33% 
100.00% 
92.31% 
76.19% 
0.00% 
0.00% 
95.24% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 


5/19/2014 2:00 SBAC-GO3-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-G04-ELA-No! COLUMBIA 93 
5/19/2014 2:00 SBAC-GO4-ELA-No! COLUMBIA 93 
5/19/2014 2:00 SBAC-GO04-ELA-No! COLUMBIA 93 
5/19/2014 2:00 SBAC-GO4-ELA-No! COLUMBIA 93 
5/19/2014 2:00 SBAC-GO4-ELA-No! COLUMBIA 93 
5/19/2014 2:00 SBAC-GO4-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-GO4-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-GO4-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-GO4-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-GO4-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-GO7-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-GO7-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-GO7-Math-N COLUMBIA 93 
5/19/2014 2:00 SBAC-HS-Math-No COLUMBIA 93 
5/19/2014 2:00 HS-Math-PT-Great CRANE R-III 

5/19/2014 2:00 SBAC-HS-Math-No CRANE R-III 


5/19/2014 2:00 HS-ELA-PT-A New CRAWFORD CO. 
5/19/2014 2:00 HS-ELA-PT-A New CRAWFORD CO. 
5/19/2014 2:00 HS-Math-PT-Great CRAWFORD CO. 
5/19/2014 2:00 SBAC-HS-ELA-NonI CRAWFORD CO. 
5/19/2014 2:00 SBAC-HS-ELA-Non| CRAWFORD CO. 
5/19/2014 2:00 SBAC-HS-Math-No CRAWFORD CO. 


5/19/2014 2:00 HS-ELA-PT-A New DADEVILLE R-II 
5/19/2014 2:00 HS-Math-PT-Great DADEVILLE R-II 
5/19/2014 2:00 SBAC-HS-ELA-NonI DADEVILLE R-II 
5/19/2014 2:00 SBAC-HS-Math-No DADEVILLE R-II 
5/19/2014 2:00 HS-ELA-PT-A New DALLAS CO. R-I 
5/19/2014 2:00 HS-ELA-PT-A New DALLAS CO. R-I 
5/19/2014 2:00 HS-Math-PT-Great DALLAS CO. R-| 
5/19/2014 2:00 SBAC-HS-ELA-NonI DALLAS CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonlI DALLAS CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-Nonl DALLAS CO. R-| 
5/19/2014 2:00 SBAC-HS-Math-No DALLAS CO. R-I 
5/19/2014 2:00 ELA-PT-Animals W DELTA C-7 
5/19/2014 2:00 ELA-PT-Technolog\ DELTA C-7 
5/19/2014 2:00 SBAC-GO3-ELA-No! DELTA C-7 
5/19/2014 2:00 SBAC-GO7-ELA-No! DELTA C-7 
5/19/2014 2:00 HS-ELA-PT-A New DELTA R-V 
5/19/2014 2:00 SBAC-HS-ELA-Nonl DELTA R-V 
5/19/2014 2:00 Math-PT-Donuts-A DESOTO 73 
5/19/2014 2:00 SBAC-GO6-Math-N DESOTO 73 
5/19/2014 2:00 SBAC-GO6-Math-N DESOTO 73 


PPD DDA 


PRP RPP PP PPP RPP PRP PRP RP RP PRP RPP RP PRP RPP RPP RP RP RPP RP RP RP RPP PP PP PP 


ee en ee ee ee ee ee eee) 


ul 
NO 


~ 
© 


OoOOoOOooooaooMmuolCODClKUlUlCUOUCUOUCUCUNUCOUCUCO 


ul 
NO 


~ 
© 


0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
96.30% 
96.30% 
100.00% 
0.00% 
0.00% 
98.57% 
0.00% 
0.00% 
0.00% 
94.12% 
0.00% 
94.12% 
0.91% 
0.00% 
89.34% 
0.00% 
0.00% 
0.00% 
90.16% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
96.77% 
100.00% 
100.00% 


0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
96.30% 
96.30% 
100.00% 
0.00% 
0.00% 
98.57% 
0.00% 
0.00% 
0.00% 
94.12% 
0.00% 
94.12% 
0.00% 
0.00% 
89.34% 
0.00% 
0.00% 
0.00% 
87.70% 
100.00% 
100.00% 
64.29% 
100.00% 
100.00% 
100.00% 
96.77% 
100.00% 
95.65% 


5/19/2014 2:00 SBAC-GO6-Math-N DESOTO 73 
5/19/2014 2:00 SBAC-GO6-Math-N DESOTO 73 
5/19/2014 2:00 HS-ELA-PT-A New DUNKLIN R-V 
5/19/2014 2:00 Math-PT-Talent Sh DUNKLIN R-V 
5/19/2014 2:00 SBAC-GO5-Math-N DUNKLIN R-V 
5/19/2014 2:00 SBAC-GO5-Math-N DUNKLIN R-V 
5/19/2014 2:00 SBAC-GO5-Math-N DUNKLIN R-V 
5/19/2014 2:00 SBAC-GO5-Math-N DUNKLIN R-V 
5/19/2014 2:00 SBAC-HS-ELA-NonI DUNKLIN R-V 
5/19/2014 2:00 SBAC-HS-ELA-NonI DUNKLIN R-V 
5/19/2014 2:00 HS-ELA-PT-A New EAST CARTER CO. | 
5/19/2014 2:00 HS-Math-PT-Great EAST CARTER CO. | 
5/19/2014 2:00 SBAC-HS-ELA-Nonl EAST CARTER CO. | 
5/19/2014 2:00 SBAC-HS-Math-No EAST CARTER CO. | 
5/19/2014 2:00 Math-PT-South Po EAST LYNNE 40 
5/19/2014 2:00 SBAC-GO7-Math-N EAST LYNNE 40 
5/19/2014 2:00 SBAC-GO7-Math-N EAST LYNNE 40 
5/19/2014 2:00 SBAC-GO7-Math-N EAST LYNNE 40 
5/19/2014 2:00 SBAC-GO7-Math-N EAST LYNNE 40 
5/19/2014 2:00 HS-Math-PT-Great EAST NEWTON CO 
5/19/2014 2:00 HS-Math-PT-Great EAST NEWTON CO 
5/19/2014 2:00 SBAC-HS-Math-No EAST NEWTON CO 
5/19/2014 2:00 SBAC-HS-Math-No EAST NEWTON CO 
5/19/2014 2:00 HS-Math-PT-Great EAST PRAIRIE R-II 
5/19/2014 2:00 SBAC-HS-Math-No EAST PRAIRIE R-II 
5/19/2014 2:00 ELA-PT-Marine Ani EL DORADO SPRIN 
5/19/2014 2:00 SBAC-GO4-ELA-Noi EL DORADO SPRIN 
5/19/2014 2:00 SBAC-GO4-ELA-Noi EL DORADO SPRIN 
5/19/2014 2:00 SBAC-GO4-ELA-Noi EL DORADO SPRIN 
5/19/2014 2:00 SBAC-GO4-ELA-Noi EL DORADO SPRIN 
5/19/2014 2:00 SBAC-GO4-ELA-Noi EL DORADO SPRIN 
5/19/2014 2:00 Math-PT-South Po ELDON R-I 
5/19/2014 2:00 SBAC-GO8-Math-N ELDON R-I 
5/19/2014 2:00 HS-Math-PT-Great EVERTON R-III 
5/19/2014 2:00 SBAC-HS-Math-No EVERTON R-III 
5/19/2014 2:00 HS-ELA-PT-A New EXETER R-VI 
5/19/2014 2:00 HS-ELA-PT-A New EXETER R-VI 
5/19/2014 2:00 SBAC-HS-ELA-Nonl EXETER R-VI 
5/19/2014 2:00 SBAC-HS-ELA-Nonl EXETER R-VI 
5/19/2014 2:00 Math-PT-Camping: FAIRFAX R-II| 
5/19/2014 2:00 SBAC-GO7-Math-N FAIRFAX R-II 
5/19/2014 2:00 SBAC-GO7-Math-N FAIRFAX R-III 
5/19/2014 2:00 SBAC-GO7-Math-N FAIRFAX R-III 


PRPrPrP PPP PPP PPP PRP RP RPP RPP RP PRP RPP RP RRP RP RP RP RP RP RP RP RPP RP RP RP RP RP PR 


113 
140 


119 


WW N 


119 


95.83% 
95.83% 
73.45% 
85.00% 
86.11% 
85.29% 
83.33% 
85.29% 
71.79% 
77.14% 
0.00% 
78.18% 
2.33% 
78.18% 
93.75% 
100.00% 
100.00% 
100.00% 
75.00% 
97.92% 
0.00% 
100.00% 
0.00% 
80.00% 
80.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
92.41% 
93.10% 
100.00% 
100.00% 
100.00% 
100.00% 
3.23% 
100.00% 
83.33% 
100.00% 
100.00% 
75.00% 


95.83% 
95.83% 
73.45% 
85.00% 
86.11% 
85.29% 
83.33% 
85.29% 
71.79% 
74.29% 
0.00% 
78.18% 
2.33% 
70.91% 
93.75% 
100.00% 
100.00% 
100.00% 
75.00% 
97.92% 
0.00% 
100.00% 
0.00% 
80.00% 
80.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
92.41% 
92.41% 
100.00% 
100.00% 
100.00% 
100.00% 
3.23% 
100.00% 
83.33% 
100.00% 
100.00% 
75.00% 


5/19/2014 2:00 SBAC-GO7-Math-N FAIRFAX R-II 

5/19/2014 2:00 ELA-PT-Animals W FARMINGTON R-V 
5/19/2014 2:00 ELA-PT-Marine Ani FARMINGTON R-V 
5/19/2014 2:00 Math-PT-Commun FARMINGTON R-V 
5/19/2014 2:00 Math-PT-South Po FARMINGTON R-V 
5/19/2014 2:00 Math-PT-Turtle Ha FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO3-ELA-Noi FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO3-ELA-No! FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO3-ELA-No! FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO5-ELA-Noi FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO5-Math-N FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO5-Math-N FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO5-Math-N FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO5-Math-N FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO5-Math-N FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO7-Math-N FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO7-Math-N FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO7-Math-N FARMINGTON R-V 
5/19/2014 2:00 SBAC-GO7-Math-N FARMINGTON R-V 
5/19/2014 2:00 ELA-PT-Growth an FERGUSON-FLORIS 
5/19/2014 2:00 Math-PT-Cell Phor FERGUSON-FLORIS 
5/19/2014 2:00 Math-PT-Cell Phor FERGUSON-FLORIS 
5/19/2014 2:00 Math-PT-Commun FERGUSON-FLORIS 
5/19/2014 2:00 Math-PT-Sandbox- FERGUSON-FLORIS 
5/19/2014 2:00 Math-PT-Turtle Ha FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO4-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO4-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO04-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO4-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO4-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO5-ELA-Noi FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO5-ELA-Noi FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO5-ELA-Noi FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO5-ELA-Noi FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO5-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO5-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO5-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO5-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO6-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO6-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO6-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO6-Math-N FERGUSON-FLORIS 
5/19/2014 2:00 SBAC-GO6-Math-N FERGUSON-FLORIS 


PRP PrP PrP PPP RP PRP RP PRP RPP RPP PRP RP RP RPP RPP RPP RP RP RPP RP RP RRP RP RPP RP PR 


100.00% 
98.92% 
0.00% 
92.88% 
96.86% 
0.00% 
96.77% 
100.00% 
100.00% 
0.00% 
96.00% 
95.83% 
100.00% 
92.00% 
95.89% 
100.00% 
100.00% 
94.12% 
98.44% 
85.71% 
100.00% 
90.00% 
98.15% 
88.24% 
95.24% 
95.65% 
100.00% 
95.65% 
100.00% 
100.00% 
86.67% 
93.75% 
81.25% 
81.25% 
88.24% 
76.47% 
94.12% 
94.12% 
100.00% 
100.00% 
100.00% 
93.33% 
81.25% 


100.00% 
98.92% 
0.00% 
91.53% 
96.86% 
0.00% 
96.77% 
100.00% 
100.00% 
0.00% 
96.00% 
95.83% 
100.00% 
92.00% 
95.89% 
100.00% 
98.41% 
94.12% 
98.44% 
85.71% 
100.00% 
90.00% 
98.15% 
88.24% 
95.24% 
91.30% 
95.83% 
95.65% 
100.00% 
95.65% 
86.67% 
93.75% 
81.25% 
81.25% 
88.24% 
76.47% 
94.12% 
94.12% 
100.00% 
100.00% 
100.00% 
93.33% 
81.25% 


5/19/2014 2:00 HS-Math-PT-Great FESTUS R-VI 
5/19/2014 2:00 SBAC-HS-Math-No FESTUS R-VI 
5/19/2014 2:00 ELA-PT-Animals W FORT OSAGE R-I 
5/19/2014 2:00 ELA-PT-Marine Ani FORT OSAGE R-I 
5/19/2014 2:00 ELA-PT-Uncommo! FORT OSAGE R-I 
5/19/2014 2:00 ELA-PT-Uncommo! FORT OSAGE R-I 
5/19/2014 2:00 Math-PT-Animal Jt FORT OSAGE R-I 
5/19/2014 2:00 Math-PT-Animal Jt FORT OSAGE R-I 
5/19/2014 2:00 Math-PT-Making S FORT OSAGE R-I 
5/19/2014 2:00 Math-PT-Making S FORT OSAGE R-I 
5/19/2014 2:00 Math-PT-School Li FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G03-ELA-No! FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G03-Math-N FORT OSAGE R-I 
5/19/2014 2:00 SBAC-GO3-Math-N FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G03-Math-N FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G03-Math-N FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G04-ELA-No! FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G04-ELA-No! FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G04-ELA-No! FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G04-ELA-No: FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G04-ELA-No! FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G04-ELA-No: FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G04-Math-N FORT OSAGE R-I 
5/19/2014 2:00 SBAC-G04-Math-N FORT OSAGE R-I 
5/19/2014 2:00 SBAC-GO04-Math-N FORT OSAGE R-I 
5/19/2014 2:00 SBAC-GO04-Math-N FORT OSAGE R-I 
5/19/2014 2:00 ELA-PT-Growth an FOX C-6 
5/19/2014 2:00 ELA-PT-Trees FOX C-6 
5/19/2014 2:00 ELA-PT-Trees-A FOX C-6 
5/19/2014 2:00 Math-PT-Commun FOX C-6 
5/19/2014 2:00 Math-PT-South Po FOX C-6 
5/19/2014 2:00 SBAC-G0O3-ELA-Noi FOX C-6 
5/19/2014 2:00 SBAC-G0O3-ELA-Noi FOX C-6 
5/19/2014 2:00 SBAC-G0O3-ELA-Noi FOX C-6 
5/19/2014 2:00 SBAC-G0O3-ELA-Noi FOX C-6 
5/19/2014 2:00 SBAC-G04-Math-N FOX C-6 
5/19/2014 2:00 SBAC-G04-Math-N FOX C-6 
5/19/2014 2:00 SBAC-G04-Math-N FOX C-6 
5/19/2014 2:00 SBAC-G04-Math-N FOX C-6 
5/19/2014 2:00 SBAC-G04-Math-N FOX C-6 
5/19/2014 2:00 SBAC-G0O5-ELA-No! FOX C-6 
5/19/2014 2:00 SBAC-GO5-ELA-Noi FOX C-6 
5/19/2014 2:00 SBAC-GO5-ELA-Noi FOX C-6 


PRPrPP RPP RP PPP PRP RP RPP RPP RPP RP RP PRP RPP PRP RP RP RP RP RP RP RP PRP PRP RP RP RP RP PR 


254 
254 


191 


221 


107 


23.23% 
23.23% 
0.00% 
63.29% 
0.00% 
0.00% 
0.00% 
24.43% 
0.00% 
72.90% 
0.00% 
0.00% 
76.00% 
78.69% 
79.31% 
56.03% 
80.00% 
84.21% 
32.46% 
45.45% 
65.00% 
68.75% 
0.00% 
0.00% 
0.00% 
0.00% 
95.00% 
100.00% 
92.94% 
71.58% 
81.45% 
93.10% 
92.86% 
96.43% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
96.00% 
96.00% 
96.00% 


23.23% 
22.83% 
0.00% 
43.04% 
0.00% 
0.00% 
0.00% 
23.08% 
0.00% 
71.03% 
0.00% 
0.00% 
74.00% 
73.77% 
75.86% 
52.48% 
73.33% 
68.42% 
27.75% 
39.39% 
55.00% 
62.50% 
0.00% 
0.00% 
0.00% 
0.00% 
95.00% 
100.00% 
92.94% 
69.47% 
81.45% 
93.10% 
89.29% 
96.43% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
96.00% 
96.00% 
96.00% 


5/19/2014 2:00 SBAC-GO5-ELA-Noi FOX C-6 
5/19/2014 2:00 SBAC-GO8-Math-N FOX C-6 
5/19/2014 2:00 SBAC-G08-Math-N FOX C-6 
5/19/2014 2:00 SBAC-GO8-Math-N FOX C-6 
5/19/2014 2:00 HS-Math-PT-Great FRANCIS HOWELL 
5/19/2014 2:00 Math-PT-Order Fo FRANCIS HOWELL 
5/19/2014 2:00 SBAC-G04-Math-N FRANCIS HOWELL 
5/19/2014 2:00 SBAC-G04-Math-N FRANCIS HOWELL 
5/19/2014 2:00 SBAC-G04-Math-N FRANCIS HOWELL 
5/19/2014 2:00 SBAC-G04-Math-N FRANCIS HOWELL 
5/19/2014 2:00 SBAC-G04-Math-N FRANCIS HOWELL 
5/19/2014 2:00 SBAC-GO08-Math-N FRANCIS HOWELL 
5/19/2014 2:00 SBAC-GO8-Math-N FRANCIS HOWELL 
5/19/2014 2:00 SBAC-GO8-Math-N FRANCIS HOWELL 
5/19/2014 2:00 HS-Math-PT-Great FRONTIER SCHOOL 
5/19/2014 2:00 SBAC-HS-Math-No FRONTIER SCHOOI 
5/19/2014 2:00 ELA-PT-Archeologi FULTON 58 
5/19/2014 2:00 HS-ELA-PT-A New FULTON 58 
5/19/2014 2:00 Math-PT-Commun FULTON 58 
5/19/2014 2:00 Math-PT-Order Fo FULTON 58 
5/19/2014 2:00 SBAC-G03-Math-N FULTON 58 
5/19/2014 2:00 SBAC-G03-Math-N FULTON 58 
5/19/2014 2:00 SBAC-G03-Math-N FULTON 58 
5/19/2014 2:00 SBAC-G05-Math-N FULTON 58 
5/19/2014 2:00 SBAC-G05-Math-N FULTON 58 
5/19/2014 2:00 SBAC-G05-Math-N FULTON 58 
5/19/2014 2:00 SBAC-G05-Math-N FULTON 58 
5/19/2014 2:00 SBAC-G0O7-ELA-No! FULTON 58 
5/19/2014 2:00 SBAC-G07-ELA-No! FULTON 58 
5/19/2014 2:00 SBAC-G07-ELA-No! FULTON 58 
5/19/2014 2:00 SBAC-G07-ELA-No! FULTON 58 
5/19/2014 2:00 SBAC-HS-ELA-Nonl FULTON 58 
5/19/2014 2:00 HS-Math-PT-Great GALLATIN R-V 
5/19/2014 2:00 SBAC-HS-Math-No GALLATIN R-V 
5/19/2014 2:00 Math-PT-Sandbox- GASCONADE CO. 
5/19/2014 2:00 SBAC-GO6-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO6-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO6-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO06-Math-N GASCONADE CO. 
5/19/2014 2:00 ELA-PT-Marine AniGASCONADE CO. 
5/19/2014 2:00 ELA-PT-Marine Ani GASCONADE CO. 
5/19/2014 2:00 HS-Math-PT-Great GASCONADE CO. 
5/19/2014 2:00 Math-PT-Cell Phor GASCONADE CO. 


7D 7D TD TTD DT DD DD TD 


PRP PPP PPP RP PRP RP PRP PRP RRP PRP RP RP RPP RR PRP RP RP RP RP RP PRP RP RP RP RP RPP PR 


100 


276 
105 


110 
110 


168 
147 


147 


PP CON N 
Ww BRB UI Ww 


ooooooooo oo Co 0 0 0o oo oo 000000 o0 0 0 O 


ms 
CO 


Pm CON N 
Ww B UW Ww 


ooooooooo oo co oO 0 0 0o o0ooo0o 0o0o0o0d0oOo Oo oO oO 


ms 
CO 


92.00% 
76.53% 
84.00% 
86.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
100.00% 
93.88% 
100.00% 
100.00% 
75.00% 
91.67% 
100.00% 
99.03% 
0.00% 
99.25% 


92.00% 
76.53% 
84.00% 
86.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
100.00% 
93.88% 
100.00% 
100.00% 
75.00% 
91.67% 
100.00% 
99.03% 
0.00% 
99.25% 


5/19/2014 2:00 Math-PT-Talent Sh GASCONADE CO. 
5/19/2014 2:00 SBAC-GO04-ELA-Noi GASCONADE CO. 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GASCONADE CO. 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GASCONADE CO. 
5/19/2014 2:00 SBAC-GO4-ELA-No! GASCONADE CO. 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GASCONADE CO. 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GASCONADE CO. 
5/19/2014 2:00 SBAC-GO5-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO5-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO5-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO5-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO6-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO6-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO6-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-GO6-Math-N GASCONADE CO. 
5/19/2014 2:00 SBAC-HS-Math-No GASCONADE CO. 
5/19/2014 2:00 Math-PT-South Po GATEWAY SCIENCI 
5/19/2014 2:00 SBAC-GO8-Math-N GATEWAY SCIENCI 
5/19/2014 2:00 Math-PT-Talent Sh GILMAN CITY R-IV 
5/19/2014 2:00 SBAC-GO6-Math-N GILMAN CITY R-IV 
5/19/2014 2:00 SBAC-GO6-Math-N GILMAN CITY R-IV 
5/19/2014 2:00 SBAC-GO6-Math-N GILMAN CITY R-IV 
5/19/2014 2:00 HS-Math-PT-Great GOLDEN CITY R-III 
5/19/2014 2:00 SBAC-HS-Math-No GOLDEN CITY R-III 
5/19/2014 2:00 Math-PT-Talent Sh GRAIN VALLEY R-V 
5/19/2014 2:00 SBAC-GO05-Math-N GRAIN VALLEY R-V 
5/19/2014 2:00 SBAC-GO5-Math-N GRAIN VALLEY R-V 
5/19/2014 2:00 SBAC-GO5-Math-N GRAIN VALLEY R-V 
5/19/2014 2:00 SBAC-GO5-Math-N GRAIN VALLEY R-V 
5/19/2014 2:00 ELA-PT-Importanct GRAND CENTER AI 
5/19/2014 2:00 SBAC-GO6-ELA-Noi GRAND CENTER AI 
5/19/2014 2:00 SBAC-GO6-ELA-Noi GRAND CENTER AI 
5/19/2014 2:00 SBAC-GO6-ELA-Noi GRAND CENTER AI 
5/19/2014 2:00 SBAC-GO6-ELA-Noi GRAND CENTER AI 
5/19/2014 2:00 ELA-PT-Animals W GRANDVIEW C-4 

5/19/2014 2:00 Math-PT-Making S GRANDVIEW C-4 

5/19/2014 2:00 Math-PT-Order Fo GRANDVIEW C-4 

5/19/2014 2:00 SBAC-GO3-ELA-Noi GRANDVIEW C-4 

5/19/2014 2:00 SBAC-GO3-Math-N GRANDVIEW C-4 

5/19/2014 2:00 SBAC-GO3-Math-N GRANDVIEW C-4 

5/19/2014 2:00 SBAC-GO3-Math-N GRANDVIEW C-4 

5/19/2014 2:00 SBAC-GO3-Math-N GRANDVIEW C-4 

5/19/2014 2:00 ELA-PT-Uncommo! GRANDVIEW R-II 


7a 7D DO T]TDTlhUC TO TDTOhCUCUTDCU THC TOC TCU TCU TO THTOhlUL TD CUO TD Uh 


PRPrPrPrPRPrP PRP PRP RPP RP PRP PRP RPP RP PRP RP RPP RPP RP RP RP RP RP RP PRP RP RP RP RP RP RP PR 


94.74% 
100.00% 
100.00% 
100.00% 

95.24% 
100.00% 
100.00% 
100.00% 
100.00% 

90.91% 
100.00% 
100.00% 
100.00% 

97.06% 
100.00% 

0.00% 
0.00% 

90.91% 
100.00% 
100.00% 
100.00% 
100.00% 

83.33% 

83.33% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 

0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
92.11% 
0.00% 

96.00% 

84.00% 

92.31% 
100.00% 
100.00% 


94.74% 
100.00% 
100.00% 
100.00% 

95.24% 
100.00% 
100.00% 
100.00% 
100.00% 

90.91% 
100.00% 
100.00% 
100.00% 

97.06% 
100.00% 

0.00% 
0.00% 

56.82% 
100.00% 
100.00% 
100.00% 
100.00% 

83.33% 

83.33% 
100.00% 

90.91% 

95.45% 

95.45% 

91.30% 

0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
92.11% 
0.00% 

96.00% 

84.00% 

92.31% 
100.00% 
100.00% 


5/19/2014 2:00 ELA-PT-Uncommo! GRANDVIEW R-II 
5/19/2014 2:00 SBAC-G04-ELA-Noi GRANDVIEW R-II 
5/19/2014 2:00 SBAC-GO04-ELA-Noi GRANDVIEW R-II 
5/19/2014 2:00 SBAC-GO04-ELA-Noi GRANDVIEW R-II 
5/19/2014 2:00 SBAC-GO4-ELA-No! GRANDVIEW R-II 
5/19/2014 2:00 SBAC-G04-ELA-Noi GRANDVIEW R-II 
5/19/2014 2:00 SBAC-G04-ELA-No! GRANDVIEW R-II 
5/19/2014 2:00 Math-PT-Turtle Ha GREEN FOREST R-I 
5/19/2014 2:00 SBAC-GO5-Math-N GREEN FOREST R-I 
5/19/2014 2:00 SBAC-GO5-Math-N GREEN FOREST R-I 
5/19/2014 2:00 SBAC-GO5-Math-N GREEN FOREST R-I 
5/19/2014 2:00 SBAC-GO5-Math-N GREEN FOREST R-I 
5/19/2014 2:00 HS-ELA-PT-A New GREENFIELD R-IV 
5/19/2014 2:00 SBAC-HS-ELA-NonI GREENFIELD R-IV 
5/19/2014 2:00 SBAC-HS-ELA-NonI GREENFIELD R-IV 
5/19/2014 2:00 Math-PT-Making S GREENVILLE R-II 
5/19/2014 2:00 SBAC-GO3-Math-N GREENVILLE R-II 
5/19/2014 2:00 ELA-PT-Marine Ani GRUNDY CO. R-V 
5/19/2014 2:00 Math-PT-Camping: GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO4-ELA-Noi GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO7-Math-N GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO7-Math-N GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO7-Math-N GRUNDY CO. R-V 
5/19/2014 2:00 SBAC-GO7-Math-N GRUNDY CO. R-V 
5/19/2014 2:00 ELA-PT-Trees-A —_ HALE R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi HALE R-| 
5/19/2014 2:00 SBAC-GO4-ELA-No! HALE R-I 
5/19/2014 2:00 SBAC-GO4-ELA-No! HALE R-| 
5/19/2014 2:00 SBAC-GO4-ELA-No! HALE R-| 
5/19/2014 2:00 SBAC-GO4-ELA-No! HALE R-I 
5/19/2014 2:00 HS-ELA-PT-A New HALLSVILLE R-IV 
5/19/2014 2:00 SBAC-HS-ELA-NonlI HALLSVILLE R-IV 
5/19/2014 2:00 Math-PT-Cell Phor HAMILTON R-II 
5/19/2014 2:00 SBAC-GO7-Math-N HAMILTON R-II 
5/19/2014 2:00 SBAC-GO7-Math-N HAMILTON R-II 
5/19/2014 2:00 ELA-PT-Archeologi HANNIBAL 60 
5/19/2014 2:00 ELA-PT-Archeologi HANNIBAL 60 
5/19/2014 2:00 Math-PT-Order Fo HANNIBAL 60 
5/19/2014 2:00 SBAC-GO04-Math-N HANNIBAL 60 


PRPrPrP PPP PPP RP RPP RPP PPP PRP PRP PPP PRP PRP RP RP RP PRP PRP RP RP RPP RP RP PR 


13 


BWWWPRP PNP N 


py 
is 


12 


PrRPRPRPOPRPPOBPWWWRRPENEF WN 


96 


12 


FPrRrOOrRFON BWW WRRPN FP N 


61 


92.98% 
100.00% 
100.00% 
100.00% 
100.00% 

90.91% 

90.91% 

90.91% 

83.33% 

80.00% 

83.33% 
100.00% 

96.00% 
100.00% 

90.91% 

90.91% 

90.91% 
100.00% 

92.31% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 

64.29% 

33.33% 

33.33% 

0.00% 
33.33% 
50.00% 

0.93% 

89.72% 
100.00% 
100.00% 
100.00% 

66.67% 

33.20% 

97.30% 

93.33% 


92.98% 
91.67% 
100.00% 
100.00% 
91.67% 
90.91% 
81.82% 
90.91% 
83.33% 
80.00% 
83.33% 
100.00% 
96.00% 
85.71% 
90.91% 
90.91% 
90.91% 
100.00% 
92.31% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
50.00% 
0.00% 
33.33% 
0.00% 
0.00% 
50.00% 
0.93% 
57.01% 
100.00% 
100.00% 
100.00% 
66.67% 
24.51% 
94.59% 
93.33% 


5/19/2014 2:00 SBAC-G04-Math-N HANNIBAL 60 
5/19/2014 2:00 SBAC-GO04-Math-N HANNIBAL 60 
5/19/2014 2:00 SBAC-GO04-Math-N HANNIBAL 60 
5/19/2014 2:00 SBAC-GO04-Math-N HANNIBAL 60 
5/19/2014 2:00 SBAC-GO7-ELA-Noi HANNIBAL 60 
5/19/2014 2:00 SBAC-GO7-ELA-Noi HANNIBAL 60 
5/19/2014 2:00 SBAC-GO7-ELA-Noi HANNIBAL 60 
5/19/2014 2:00 SBAC-GO7-ELA-Noi HANNIBAL 60 
5/19/2014 2:00 SBAC-GO7-ELA-Noi HANNIBAL 60 


5/19/2014 2:00 


HS-ELA-PT-A New HARDIN-CENTRAL 


5/19/2014 2:00 SBAC-HS-ELA-NonI HARDIN-CENTRAL 
5/19/2014 2:00 SBAC-HS-ELA-NonI HARDIN-CENTRAL 


5/19/2014 2:00 


HS-Math-PT-Great HARRISBURG R-VII 


5/19/2014 2:00 SBAC-HS-Math-No HARRISBURG R-VII 


5/19/2014 2:00 


HS-Math-PT-Great HAYTI R-II 


5/19/2014 2:00 SBAC-HS-Math-No HAYTI R-II 


5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 


ELA-PT-Animals W HAZELWOOD 
ELA-PT-Archeologi HAZELWOOD 
ELA-PT-Archeologi HAZELWOOD 
ELA-PT-Growth an HAZELWOOD 
ELA-PT-Marine Ani HAZELWOOD 
ELA-PT-Marine Ani HAZELWOOD 
ELA-PT-Uncommo! HAZELWOOD 
ELA-PT-Uncommo! HAZELWOOD 
HS-ELA-PT-A New HAZELWOOD 
Math-PT-Animal Jt HAZELWOOD 
Math-PT-Animal Jt HAZELWOOD 
Math-PT-Donuts HAZELWOOD 
Math-PT-Donuts-A HAZELWOOD 
Math-PT-Making S HAZELWOOD 
Math-PT-Making S HAZELWOOD 
Math-PT-Talent Sh HAZELWOOD 
Math-PT-Talent Sh HAZELWOOD 


5/19/2014 2:00 SBAC-GO3-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO3-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO3-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO3-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO4-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-G04-ELA-No! HAZELWOOD 
5/19/2014 2:00 SBAC-G04-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO4-ELA-No! HAZELWOOD 
5/19/2014 2:00 SBAC-GO04-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO04-Math-N HAZELWOOD 


PRP RPRPPPPPP RP RPP RPP RPP PRP RPP RPP PHP PPP PRP PRP RPP RPP PRP RPP PB 


a 
hus fh 


OOOO ON FR FP UO 


40 


93.33% 
100.00% 
100.00% 
100.00% 

32.81% 

44.44% 

66.67% 

39.68% 

22.22% 

0.00% 
0.00% 
0.00% 
0.00% 
0.00% 

76.92% 

76.92% 

96.79% 

89.71% 

95.45% 

0.00% 

85.33% 

81.67% 

72.18% 

86.84% 

53.24% 

91.89% 

94.79% 

88.62% 

66.67% 

45.54% 

35.90% 

88.37% 

84.96% 

96.79% 

53.33% 

29.17% 

46.53% 
100.00% 

95.11% 

87.50% 

92.31% 
100.00% 

91.89% 


93.33% 
93.33% 
100.00% 
100.00% 
0.00% 
7.94% 
33.33% 
1.59% 
3.17% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
76.92% 
76.92% 
94.87% 
67.65% 
77.27% 
0.00% 
80.00% 
78.33% 
67.67% 
76.32% 
52.16% 
91.89% 
93.75% 
84.14% 
66.67% 
45.54% 
35.90% 
86.30% 
83.19% 
95.51% 
53.33% 
29.17% 
46.53% 
88.89% 
82.71% 
87.50% 
76.92% 
100.00% 
89.19% 


5/19/2014 2:00 SBAC-G04-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO4-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO5-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO5-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO5-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO5-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO5-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO6-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO6-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO6-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO6-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO6-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO6-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO7-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO7-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO7-Math-N HAZELWOOD 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HAZELWOOD 
5/19/2014 2:00 SBAC-HS-ELA-NonI HAZELWOOD 
5/19/2014 2:00 SBAC-HS-ELA-NonI HAZELWOOD 
5/19/2014 2:00 HS-ELA-PT-A New HENRY CO. R-I 
5/19/2014 2:00 HS-Math-PT-Great HENRY CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonI HENRY CO. R-I 
5/19/2014 2:00 SBAC-HS-Math-No HENRY CO. R-I 
5/19/2014 2:00 Math-PT-South Po HIGBEE R-VIII 
5/19/2014 2:00 SBAC-GO8-Math-N HIGBEE R-VIII 
5/19/2014 2:00 SBAC-GO8-Math-N HIGBEE R-VIII 
5/19/2014 2:00 SBAC-GO8-Math-N HIGBEE R-VIII 
5/19/2014 2:00 SBAC-GO8-Math-N HIGBEE R-VIII 
5/19/2014 2:00 SBAC-GO8-Math-N HIGBEE R-VIII 
5/19/2014 2:00 HS-ELA-PT-A New HILLSBORO R-III 
5/19/2014 2:00 SBAC-HS-ELA-NonlI HILLSBORO R-II 
5/19/2014 2:00 SBAC-HS-ELA-Nonl HILLSBORO R-II 
5/19/2014 2:00 HS-ELA-PT-A New HOLCOMB R-III 
5/19/2014 2:00 HS-ELA-PT-A New HOLCOMB R-III 
5/19/2014 2:00 HS-Math-PT-Great HOLCOMB R-III 
5/19/2014 2:00 SBAC-HS-ELA-NonI HOLCOMB R-III 
5/19/2014 2:00 SBAC-HS-ELA-NonI HOLCOMB R-III 
5/19/2014 2:00 SBAC-HS-ELA-NonI HOLCOMB R-III 
5/19/2014 2:00 SBAC-HS-Math-No HOLCOMB R-III 


PRP rPrPrPP PPP RPP PPP RPP RPP BPP RP RP RPP RP RPP PRP PRP PRP RP RP RP PRP PRP PP 


19 


Ooo fF HF HB HW 


QW 
oO N 


39 


re Oo CO 


19 


Ooo fF HF HB HW 


WW 
oO N 


39 


oO O 


100.00% 
97.87% 
93.33% 
86.96% 
97.33% 

100.00% 
90.91% 

0.00% 

100.00% 

100.00% 
91.73% 
86.84% 
80.95% 
88.97% 

100.00% 

100.00% 

100.00% 

100.00% 
94.61% 
85.71% 

100.00% 
52.63% 
62.22% 

0.00% 

100.00% 

0.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

0.00% 
0.00% 
0.00% 
82.05% 
0.00% 
0.00% 
100.00% 
0.00% 
0.00% 
2.56% 


95.83% 
91.49% 
86.67% 
73.91% 
89.33% 
90.91% 
72.73% 
0.00% 
100.00% 
92.31% 
86.30% 
73.68% 
78.57% 
76.21% 
100.00% 
100.00% 
0.00% 
100.00% 
76.96% 
57.14% 
63.64% 
47.89% 
61.11% 
0.00% 
100.00% 
0.00% 
98.31% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
0.00% 
82.05% 
0.00% 
0.00% 
100.00% 
0.00% 
0.00% 
0.00% 


5/19/2014 2:00 HS-ELA-PT-A New HOLDEN R-III 
5/19/2014 2:00 HS-ELA-PT-A New HOLDEN R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HOLDEN R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HOLDEN R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HOLDEN R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HOLDEN R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi HOLDEN R-III 
5/19/2014 2:00 SBAC-GO8-ELA-No! HOLDEN R-III 
5/19/2014 2:00 HS-Math-PT-Great HUMANSVILLE R-I' 
5/19/2014 2:00 SBAC-GO8-Math-N HUMANSVILLE R-I' 
5/19/2014 2:00 SBAC-GO8-Math-N HUMANSVILLE R-I' 
5/19/2014 2:00 SBAC-GO8-Math-N HUMANSVILLE R-I' 
5/19/2014 2:00 SBAC-GO8-Math-N HUMANSVILLE R-I' 
5/19/2014 2:00 SBAC-GO8-Math-N HUMANSVILLE R-I' 
5/19/2014 2:00 Math-PT-South Po IBERIA R-V 
5/19/2014 2:00 SBAC-GO7-Math-N IBERIA R-V 
5/19/2014 2:00 SBAC-GO7-Math-N IBERIA R-V 
5/19/2014 2:00 SBAC-GO7-Math-N IBERIA R-V 
5/19/2014 2:00 SBAC-GO7-Math-N IBERIA R-V 
5/19/2014 2:00 ELA-PT-Growth an INDEPENDENCE 3( 
5/19/2014 2:00 ELA-PT-Marine Ani INDEPENDENCE 3( 
5/19/2014 2:00 ELA-PT-Marine Ani INDEPENDENCE 3¢ 
5/19/2014 2:00 ELA-PT-The Americ INDEPENDENCE 3( 
5/19/2014 2:00 ELA-PT-The Americ INDEPENDENCE 3( 
5/19/2014 2:00 ELA-PT-Uncommo! INDEPENDENCE 3( 
5/19/2014 2:00 Math-PT-Animal Jt INDEPENDENCE 3¢ 
5/19/2014 2:00 Math-PT-Order Fo INDEPENDENCE 3( 
5/19/2014 2:00 Math-PT-Turtle Ha INDEPENDENCE 3( 
5/19/2014 2:00 SBAC-GO3-Math-N INDEPENDENCE 3( 
5/19/2014 2:00 SBAC-GO3-Math-N INDEPENDENCE 3( 
5/19/2014 2:00 SBAC-GO3-Math-N INDEPENDENCE 3( 
5/19/2014 2:00 SBAC-GO4-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO4-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO4-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO4-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO4-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-G04-Math-N INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-G04-Math-N INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-G04-Math-N INDEPENDENCE 3( 
5/19/2014 2:00 SBAC-G04-Math-N INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-G04-Math-N INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO5-ELA-No! INDEPENDENCE 3( 
5/19/2014 2:00 SBAC-GO5-ELA-No: INDEPENDENCE 3¢ 


PRP PrP PrP PPP RP PRP RP RPP RP PRP RP PRP PRP RPP RP RRP RP RP RP RP RP RP PRP RP PRP RP RP RP PB 


0.00% 
94.59% 
82.61% 

100.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
94.87% 
100.00% 
100.00% 
90.91% 
100.00% 
96.10% 

0.00% 

98.00% 

100.00% 
44.87% 
89.87% 
91.78% 
97.62% 

0.00% 

92.86% 
100.00% 
100.00% 
100.00% 

95.24% 
100.00% 

95.24% 

95.83% 

80.00% 
100.00% 
100.00% 

93.33% 

92.86% 
100.00% 

94.44% 


0.00% 
87.39% 
78.26% 
71.43% 
0.00% 
91.30% 
86.36% 
72.73% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 

94.87% 
100.00% 
100.00% 

90.91% 
100.00% 

89.61% 

0.00% 

96.00% 

100.00% 
44.55% 
87.34% 
91.78% 
97.62% 

0.00% 

92.86% 
100.00% 
100.00% 
100.00% 

95.24% 

95.45% 

95.24% 

91.67% 

80.00% 
100.00% 
100.00% 

93.33% 

92.86% 
100.00% 

91.67% 


5/19/2014 2:00 SBAC-GO5-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO5-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO5-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO05-Math-N INDEPENDENCE 3( 
5/19/2014 2:00 SBAC-GO6-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO6-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO6-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO6-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 SBAC-GO6-ELA-No! INDEPENDENCE 3¢ 
5/19/2014 2:00 HS-ELA-PT-A New IRON CO. C-4 
5/19/2014 2:00 HS-Math-PT-Great IRON CO. C-4 
5/19/2014 2:00 SBAC-HS-ELA-Nonl IRON CO. C-4 
5/19/2014 2:00 SBAC-HS-Math-No IRON CO. C-4 
5/19/2014 2:00 ELA-PT-Growth an JACKSON R-II 
5/19/2014 2:00 ELA-PT-Inventions JACKSON R-II 
5/19/2014 2:00 Math-PT-Talent Sh JACKSON R-II 
5/19/2014 2:00 SBAC-GO6-ELA-No! JACKSON R-II 
5/19/2014 2:00 SBAC-GO6-Math-N JACKSON R-II 
5/19/2014 2:00 ELA-PT-Archeologi JAMESTOWN C-1 
5/19/2014 2:00 Math-PT-Cell Phor JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO6-Math-N JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO6-Math-N JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO6-Math-N JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO6-Math-N JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO8-ELA-Noi JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO8-ELA-Noi JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO8-ELA-Noi JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO8-ELA-Noi JAMESTOWN C-1 
5/19/2014 2:00 SBAC-GO8-ELA-Noi JAMESTOWN C-1 
5/19/2014 2:00 ELA-PT-Marine Ani JEFFERSON CITY 
5/19/2014 2:00 ELA-PT-Marine Ani JEFFERSON CITY 
5/19/2014 2:00 ELA-PT-Technolog\ JEFFERSON CITY 
5/19/2014 2:00 ELA-PT-Technolog\ JEFFERSON CITY 
5/19/2014 2:00 Math-PT-Order Fo JEFFERSON CITY 
5/19/2014 2:00 Math-PT-Order Fo JEFFERSON CITY 
5/19/2014 2:00 Math-PT-Turtle Ha JEFFERSON CITY 
5/19/2014 2:00 Math-PT-Turtle Ha JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-ELA-Noi JEFFERSON CITY 


PRP RPP PrP RP PPP PRP RP PRP RP PRP PRP RPP RPP PRPBP HPP PRP RPP RPP BRPBP BRP PRP RP RPP PR 


17 


SSFP Pun fH HF FS 


72 


17 


NODOO0CO0OOBRBWRWUBR KR KR HKRO 


39 


15 


NOD OO OO WW FW OOO OO O 


— 
CO W 


11 


11 


100.00% 
97.30% 
100.00% 
0.00% 
46.15% 
43.59% 
0.00% 
48.72% 
46.15% 
89.29% 
0.00% 
100.00% 
0.00% 
0.00% 
96.02% 
0.00% 
96.34% 
0.00% 
80.95% 
0.00% 
100.00% 
100.00% 
100.00% 
80.00% 
100.00% 
75.00% 
100.00% 
75.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
46.43% 
100.00% 
78.57% 
100.00% 
86.67% 
92.86% 
93.33% 


100.00% 
94.59% 
100.00% 
0.00% 
42.31% 
42.31% 
0.00% 
48.72% 
44.87% 
89.29% 
0.00% 
100.00% 
0.00% 
0.00% 
96.02% 
0.00% 
96.06% 
0.00% 
71.43% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
75.00% 
100.00% 
75.00% 
75.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
42.86% 
92.86% 
57.14% 
75.00% 
73.33% 
57.14% 
73.33% 


5/19/2014 2:00 SBAC-G04-Math-N JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-Math-N JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-Math-N JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-Math-N JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO4-Math-N JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO04-Math-N JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO7-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO7-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO7-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO7-ELA-Noi JEFFERSON CITY 
5/19/2014 2:00 SBAC-GO7-ELA-Noi JEFFERSON CITY 


5/19/2014 2:00 Math-PT-Animal Ju JEFFERSON CO. R-' 
5/19/2014 2:00 Math-PT-Animal Jt JEFFERSON CO. R-' 
5/19/2014 2:00 SBAC-GO04-Math-N JEFFERSON CO. R-' 
5/19/2014 2:00 SBAC-G04-Math-N JEFFERSON CO. R-' 
5/19/2014 2:00 SBAC-G04-Math-N JEFFERSON CO. R-' 
5/19/2014 2:00 SBAC-GO4-Math-N JEFFERSON CO. R-' 
5/19/2014 2:00 SBAC-G04-Math-N JEFFERSON CO. R-' 
5/19/2014 2:00 SBAC-G04-Math-N JEFFERSON CO. R-' 


5/19/2014 2:00 HS-Math-PT-Great JENNINGS 

5/19/2014 2:00 SBAC-HS-Math-No JENNINGS 

5/19/2014 2:00 ELA-PT-Growth an JOPLIN SCHOOLS 
5/19/2014 2:00 ELA-PT-Marine Ani JOPLIN SCHOOLS 
5/19/2014 2:00 ELA-PT-Trees-A JOPLIN SCHOOLS 
5/19/2014 2:00 Math-PT-Animal Jt JOPLIN SCHOOLS 
5/19/2014 2:00 Math-PT-Commun JOPLIN SCHOOLS 
5/19/2014 2:00 Math-PT-Order Fo JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO3-ELA-Noi JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO3-ELA-Noi JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO3-ELA-No! JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO3-Math-N JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO3-Math-N JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO3-Math-N JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO4-Math-N JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO4-Math-N JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO04-Math-N JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO4-Math-N JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO04-Math-N JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO5-ELA-Noi JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO6-ELA-Noi JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO6-ELA-No! JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO6-ELA-Noi JOPLIN SCHOOLS 
5/19/2014 2:00 SBAC-GO6-ELA-Noi JOPLIN SCHOOLS 


PRPRPP RPP PPP PPP PRP RPP RPP RPP PHP PPP PRP RPP RPP RP RP RP RP RP RPP PRP PP PR 


400 
114 


114 
100 
100 
100 
100 


311 


309 


74.19% 
72.73% 
60.00% 
75.00% 
62.50% 
78.13% 
94.74% 
88.00% 
100.00% 
93.33% 
93.33% 
0.00% 
86.21% 
88.24% 
82.35% 
50.00% 
88.89% 
88.24% 
94.44% 
79.31% 
82.76% 
77.15% 
53.51% 
90.24% 
91.23% 
86.75% 
92.45% 
84.62% 
85.71% 
100.00% 
100.00% 
88.89% 
100.00% 
92.86% 
89.29% 
89.29% 
85.71% 
89.29% 
53.51% 
80.00% 
75.00% 
83.00% 
84.00% 


64.52% 
72.73% 
60.00% 
75.00% 
62.50% 
71.88% 
90.79% 
88.00% 
100.00% 
92.00% 
93.33% 
0.00% 
86.21% 
88.24% 
70.59% 
50.00% 
88.89% 
88.24% 
77.18% 
79.31% 
82.76% 
77.25% 
53.51% 
90.24% 
91.23% 
86.75% 
92.45% 
84.62% 
85.71% 
100.00% 
100.00% 
88.89% 
100.00% 
92.86% 
89.29% 
89.29% 
85.71% 
89.29% 
52.63% 
76.00% 
73.00% 
80.00% 
82.00% 


5/19/2014 2:00 HS-Math-PT-Great KANSAS CITY 33 
5/19/2014 2:00 Math-PT-Commun KANSAS CITY 33 
5/19/2014 2:00 SBAC-GO5-Math-N KANSAS CITY 33 
5/19/2014 2:00 SBAC-GO5-Math-N KANSAS CITY 33 
5/19/2014 2:00 SBAC-GO5-Math-N KANSAS CITY 33 
5/19/2014 2:00 SBAC-GO5-Math-N KANSAS CITY 33 
5/19/2014 2:00 SBAC-HS-Math-No KANSAS CITY 33 
5/19/2014 2:00 ELA-PT-Trees KEARNEY R-I 
5/19/2014 2:00 SBAC-GO3-ELA-Noi KEARNEY R-I 
5/19/2014 2:00 Math-PT-South Po KELSO C-7 
5/19/2014 2:00 SBAC-GO8-Math-N KELSO C-7 
5/19/2014 2:00 SBAC-GO8-Math-N KELSO C-7 
5/19/2014 2:00 SBAC-GO8-Math-N KELSO C-7 
5/19/2014 2:00 ELA-PT-Heatwaves KENNETT 39 
5/19/2014 2:00 HS-Math-PT-Great KENNETT 39 
5/19/2014 2:00 SBAC-GO3-ELA-No! KENNETT 39 
5/19/2014 2:00 SBAC-HS-Math-No KENNETT 39 
5/19/2014 2:00 ELA-PT-Archeologi KEYTESVILLE R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi KEYTESVILLE R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi KEYTESVILLE R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi KEYTESVILLE R-III 
5/19/2014 2:00 SBAC-GO8-ELA-No! KEYTESVILLE R-III 
5/19/2014 2:00 SBAC-GO8-ELA-No! KEYTESVILLE R-III 
5/19/2014 2:00 Math-PT-Animal Ju KINGSVILLE R-| 
5/19/2014 2:00 SBAC-GO4-Math-N KINGSVILLE R-| 
5/19/2014 2:00 SBAC-GO4-Math-N KINGSVILLE R-| 
5/19/2014 2:00 SBAC-GO04-Math-N KINGSVILLE R-I 
5/19/2014 2:00 SBAC-GO04-Math-N KINGSVILLE R-I 
5/19/2014 2:00 SBAC-GO04-Math-N KINGSVILLE R-I 
5/19/2014 2:00 Math-PT-South Po KIPP ST LOUIS 
5/19/2014 2:00 SBAC-GO8-Math-N KIPP ST LOUIS 
5/19/2014 2:00 ELA-PT-Trees KIRKSVILLE R-III 
5/19/2014 2:00 ELA-PT-Trees-A — KIRKSVILLE R-II| 
5/19/2014 2:00 Math-PT-Baseball KIRKSVILLE R-III 
5/19/2014 2:00 Math-PT-Baseball- KIRKSVILLE R-III 
5/19/2014 2:00 SBAC-G04-ELA-Noi KIRKSVILLE R-III 
5/19/2014 2:00 SBAC-GO4-ELA-Noi KIRKSVILLE R-III 
5/19/2014 2:00 SBAC-GO4-ELA-Noi KIRKSVILLE R-III 
5/19/2014 2:00 SBAC-G04-ELA-Noi KIRKSVILLE R-III 
5/19/2014 2:00 SBAC-GO4-ELA-Noi KIRKSVILLE R-II 
5/19/2014 2:00 SBAC-GO4-ELA-No! KIRKSVILLE R-II| 
5/19/2014 2:00 SBAC-GO8-Math-N KIRKSVILLE R-III 
5/19/2014 2:00 SBAC-GO8-Math-N KIRKSVILLE R-II 


PRP PrP PRP PRP RPP RPP RPP RPP RP RP RP PRP RPP RP RP RP RP RP RP RP RRP RP RP RP RP RP RP RP RB 


302 


oO OoO OO oO Oo 


2 


oO 


Wwoortrstr PH SH 


OOO OO oO Oo 


NO 
© 


WwoortstksPr SHH 


0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
71.70% 
98.11% 
100.00% 
100.00% 
100.00% 
100.00% 
90.96% 
79.69% 
93.79% 
84.38% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
75.00% 
88.37% 
33.33% 
79.17% 
80.00% 
85.29% 
75.00% 
91.43% 
91.18% 
94.12% 
0.00% 
0.00% 


0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
52.83% 
88.68% 
100.00% 
100.00% 
100.00% 
100.00% 
90.96% 
79.69% 
93.22% 
84.38% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
75.00% 
88.37% 
33.33% 
79.17% 
80.00% 
85.29% 
75.00% 
91.43% 
91.18% 
94.12% 
0.00% 
0.00% 


5/19/2014 2:00 SBAC-GO8-Math-N KIRKSVILLE R-III 
5/19/2014 2:00 SBAC-GO8-Math-N KIRKSVILLE R-III 
5/19/2014 2:00 SBAC-GO8-Math-N KIRKSVILLE R-II| 
5/19/2014 2:00 SBAC-GO8-Math-N KIRKSVILLE R-III 
5/19/2014 2:00 ELA-PT-Animals W KNOB NOSTER R-V 
5/19/2014 2:00 SBAC-GO3-ELA-Noi KNOB NOSTER R-V 
5/19/2014 2:00 HS-ELA-PT-A New KNOX CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonI KNOX CO. R-I 
5/19/2014 2:00 HS-ELA-PT-A New LA MONTE R-IV 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LA MONTE R-IV 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LA MONTE R-IV 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LA MONTE R-IV 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LA MONTE R-IV 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LA MONTE R-IV 
5/19/2014 2:00 HS-Math-PT-Great LA PLATA R-II 
5/19/2014 2:00 SBAC-HS-Math-No LA PLATA R-II 
5/19/2014 2:00 Math-PT-Fitness C LACLEDE CO. C-5 
5/19/2014 2:00 SBAC-GO3-Math-N LACLEDE CO. C-5 
5/19/2014 2:00 HS-ELA-PT-A New LACLEDE CO. R-I 
5/19/2014 2:00 HS-ELA-PT-A New LACLEDE CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonI LACLEDE CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonI LACLEDE CO. R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonI LACLEDE CO. R-I 


5/19/2014 2:00 HS-ELA-PT-A New LAFAYETTE CO. C-: 
5/19/2014 2:00 Math-PT-Science K LAFAYETTE CO. C-: 
5/19/2014 2:00 SBAC-GO3-Math-N LAFAYETTE CO. C-: 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LAFAYETTE CO. C-: 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LAFAYETTE CO. C-: 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LAFAYETTE CO. C-: 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LAFAYETTE CO. C-: 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LAFAYETTE CO. C-: 
5/19/2014 2:00 SBAC-HS-ELA-Nonl LAFAYETTE CO. C-: 
5/19/2014 2:00 SBAC-HS-ELA-NonI LAFAYETTE CO. C-: 


5/19/2014 2:00 HS-ELA-PT-A New LAMAR R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LAMAR R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LAMAR R-I 
5/19/2014 2:00 SBAC-GO8-ELA-No! LAMAR R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LAMAR R-I 
5/19/2014 2:00 SBAC-GO8-ELA-No! LAMAR R-| 
5/19/2014 2:00 SBAC-HS-ELA-NonI LAMAR R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonI LAMAR R-I 
5/19/2014 2:00 HS-ELA-PT-A New LAQUEY R-V 
5/19/2014 2:00 SBAC-HS-ELA-NonI LAQUEY R-V 


PRP RPP PRP RP PPP PRP RP RP RP RP PRP RPP RP RP PRP RP RPR RPP RP RP RP RP RP RP RPP PRP RP RP RP PR 


mum B&B UW WwW 


199 


26 


ee ee ee eee) 


NO 
— 


22 


26 


ee ee ee ee ee ere) 


pay 
O 


22 


33.33% 
85.33% 
84.93% 
71.79% 
94.83% 
93.10% 
0.00% 
0.00% 
95.45% 
100.00% 
100.00% 
100.00% 
80.00% 
100.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
94.92% 
100.00% 
96.88% 
96.30% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
95.45% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
70.83% 
70.83% 


33.33% 
85.33% 
84.93% 
71.79% 
94.83% 
91.38% 
0.00% 
0.00% 
90.91% 
100.00% 
100.00% 
100.00% 
80.00% 
100.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
94.92% 
100.00% 
96.88% 
96.30% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
86.36% 
100.00% 
95.24% 
100.00% 
90.48% 
0.00% 
0.00% 
62.50% 
70.83% 


5/19/2014 2:00 ELA-PT-Growth an LATHROP R-II 
5/19/2014 2:00 HS-ELA-PT-A New LATHROP R-II 
5/19/2014 2:00 HS-ELA-PT-A New LATHROP R-II 
5/19/2014 2:00 SBAC-GO6-ELA-Noi LATHROP R-II 
5/19/2014 2:00 SBAC-GO6-ELA-Noi LATHROP R-II 
5/19/2014 2:00 SBAC-GO6-ELA-Noi LATHROP R-II 
5/19/2014 2:00 SBAC-GO6-ELA-Noi LATHROP R-II 
5/19/2014 2:00 SBAC-HS-ELA-NonI LATHROP R-II 
5/19/2014 2:00 SBAC-HS-ELA-NonI LATHROP R-II 
5/19/2014 2:00 SBAC-HS-ELA-NonI LATHROP R-II 
5/19/2014 2:00 ELA-PT-Archeologi LEWIS CO. C-1 
5/19/2014 2:00 Math-PT-Animal Ju LEWIS CO. C-1 
5/19/2014 2:00 SBAC-G04-Math-N LEWIS CO. C-1 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LEWIS CO. C-1 
5/19/2014 2:00 Math-PT-Making S LEXINGTON R-V 
5/19/2014 2:00 SBAC-GO3-Math-N LEXINGTON R-V 
5/19/2014 2:00 SBAC-GO3-Math-N LEXINGTON R-V 
5/19/2014 2:00 SBAC-GO3-Math-N LEXINGTON R-V 
5/19/2014 2:00 HS-Math-PT-Great LIBERAL R-II 
5/19/2014 2:00 SBAC-HS-Math-No LIBERAL R-II 
5/19/2014 2:00 Math-PT-Order Fo LIBERTY 53 
5/19/2014 2:00 SBAC-GO3-Math-N LIBERTY 53 
5/19/2014 2:00 SBAC-GO3-Math-N LIBERTY 53 
5/19/2014 2:00 SBAC-GO3-Math-N LIBERTY 53 
5/19/2014 2:00 HS-ELA-PT-A New LICKING R-VIII 
5/19/2014 2:00 HS-ELA-PT-A New LICKING R-VIII 
5/19/2014 2:00 SBAC-GO8-ELA-No! LICKING R-VIII 
5/19/2014 2:00 SBAC-GO8-ELA-No! LICKING R-VIII 
5/19/2014 2:00 SBAC-GO8-ELA-No! LICKING R-VIII 
5/19/2014 2:00 SBAC-GO8-ELA-No! LICKING R-VIII 
5/19/2014 2:00 SBAC-GO8-ELA-No! LICKING R-VIII 
5/19/2014 2:00 SBAC-GO8-ELA-No! LICKING R-VIII 
5/19/2014 2:00 HS-Math-PT-Great LIFT FOR LIFE ACAI 
5/19/2014 2:00 SBAC-HS-Math-No LIFT FOR LIFE ACAI 
5/19/2014 2:00 HS-ELA-PT-A New LINCOLN R-II 
5/19/2014 2:00 HS-Math-PT-Great LINCOLN R-II 
5/19/2014 2:00 SBAC-HS-ELA-NonI LINCOLN R-II 
5/19/2014 2:00 SBAC-HS-Math-No LINCOLN R-II 
5/19/2014 2:00 Math-PT-Camping: LINDBERGH SCHO( 
5/19/2014 2:00 SBAC-GO8-Math-N LINDBERGH SCHO\ 
5/19/2014 2:00 SBAC-GO8-Math-N LINDBERGH SCHO( 
5/19/2014 2:00 SBAC-GO8-Math-N LINDBERGH SCHO\ 
5/19/2014 2:00 HS-Math-PT-Great LINN CO. R-I 


PRPrPP PPP PPP PRP RP RP RPP RPP PRP PRP PRP RP PRP PRP RP PRP RP RP RPP RPP PP RP PR 


200 


95.65% 
83.33% 
0.00% 
95.83% 
100.00% 
100.00% 
100.00% 
96.67% 
0.00% 
0.00% 
89.87% 
91.18% 
92.65% 
98.73% 
0.00% 
79.17% 
92.00% 
100.00% 
51.43% 
0.00% 
88.54% 
96.77% 
87.88% 
93.75% 
100.00% 
93.44% 
91.67% 
91.67% 
100.00% 
100.00% 
100.00% 
92.31% 
91.03% 
88.46% 
0.00% 
100.00% 
0.00% 
0.00% 
90.50% 
90.00% 
90.00% 
97.50% 
83.33% 


95.65% 
83.33% 
0.00% 
91.67% 
95.65% 
100.00% 
100.00% 
96.67% 
0.00% 
0.00% 
88.61% 
69.12% 
89.71% 
97.47% 
0.00% 
75.00% 
88.00% 
95.65% 
51.43% 
0.00% 
87.50% 
96.77% 
87.88% 
81.25% 
100.00% 
93.44% 
91.67% 
91.67% 
100.00% 
100.00% 
100.00% 
92.31% 
91.03% 
87.18% 
0.00% 
100.00% 
0.00% 
0.00% 
89.00% 
90.00% 
90.00% 
95.00% 
83.33% 


5/19/2014 2:00 SBAC-HS-Math-No LINN CO. R-I 
5/19/2014 2:00 ELA-PT-Renewable LOCKWOOD R-I 
5/19/2014 2:00 ELA-PT-Renewable LOCKWOOD R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LOCKWOOD R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LOCKWOOD R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LOCKWOOD R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LOCKWOOD R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LOCKWOOD R-I 
5/19/2014 2:00 SBAC-GO8-ELA-Noi LOCKWOOD R-I 
5/19/2014 2:00 ELA-PT-Uncommoi LONE JACK C-6 
5/19/2014 2:00 ELA-PT-Uncommoi LONE JACK C-6 
5/19/2014 2:00 SBAC-GO4-ELA-Noi LONE JACK C-6 
5/19/2014 2:00 SBAC-GO4-ELA-Noi LONE JACK C-6 
5/19/2014 2:00 SBAC-GO4-ELA-Noi LONE JACK C-6 
5/19/2014 2:00 ELA-PT-Trees LOUISIANA R-II 
5/19/2014 2:00 ELA-PT-Trees-A — LOUISIANA R-II 
5/19/2014 2:00 SBAC-GO3-ELA-Noi LOUISIANA R-II 
5/19/2014 2:00 SBAC-GO3-ELA-Noi LOUISIANA R-II 
5/19/2014 2:00 SBAC-GO3-ELA-Noi LOUISIANA R-II 
5/19/2014 2:00 SBAC-GO3-ELA-No LOUISIANA R-II 
5/19/2014 2:00 ELA-PT-Uncommoi MACON CO. R-I 
5/19/2014 2:00 ELA-PT-Uncommo! MACON CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MACON CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MACON CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MACON CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MACON CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MACON CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MACON CO. R-I 
5/19/2014 2:00 HS-Math-PT-Great MACON CO. R-IV 
5/19/2014 2:00 SBAC-HS-Math-No MACON CO. R-IV 
5/19/2014 2:00 ELA-PT-Importanct MALDEN R-I 
5/19/2014 2:00 SBAC-GO6-ELA-Noi MALDEN R-I 
5/19/2014 2:00 HS-Math-PT-Great MAPLEWOOD-RIC| 
5/19/2014 2:00 Math-PT-Donuts MAPLEWOOD-RIC! 
5/19/2014 2:00 Math-PT-Donuts-A MAPLEWOOD-RIC| 
5/19/2014 2:00 Math-PT-Making S MAPLEWOOD-RIC! 
5/19/2014 2:00 Math-PT-Making S MAPLEWOOD-RIC! 
5/19/2014 2:00 SBAC-GO3-Math-N MAPLEWOOD-RIC| 
5/19/2014 2:00 SBAC-GO3-Math-N MAPLEWOOD-RIC! 
5/19/2014 2:00 SBAC-GO3-Math-N MAPLEWOOD-RICI 
5/19/2014 2:00 SBAC-GO7-Math-N MAPLEWOOD-RIC! 
5/19/2014 2:00 SBAC-GO7-Math-N MAPLEWOOD-RIC| 
5/19/2014 2:00 SBAC-HS-Math-No MAPLEWOOD-RIC| 


PPP P PPP PPP PPP PRP PRP PRP RP PRP RP RPP PPP RP RPP RP PRP PRP RP RP RPP PRP PR 


83.33% 
100.00% 
91.18% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
85.71% 
97.73% 
100.00% 
95.45% 
100.00% 
100.00% 
100.00% 
84.85% 
71.43% 
95.65% 
95.45% 
100.00% 
0.00% 
92.08% 
90.48% 
95.00% 
50.00% 
100.00% 
100.00% 
95.00% 
100.00% 
100.00% 
80.56% 
80.56% 
0.00% 
80.23% 
100.00% 
83.82% 
82.93% 
92.86% 
85.19% 
77.94% 
84.88% 
100.00% 
0.00% 


83.33% 
100.00% 
88.24% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
85.71% 
90.91% 
80.00% 
93.18% 
100.00% 
100.00% 
100.00% 
84.85% 
71.43% 
95.65% 
95.45% 
100.00% 
0.00% 
90.10% 
90.48% 
85.00% 
50.00% 
90.00% 
95.00% 
95.00% 
100.00% 
100.00% 
76.39% 
80.56% 
0.00% 
80.23% 
100.00% 
83.82% 
82.93% 
92.86% 
85.19% 
76.47% 
83.72% 
100.00% 
0.00% 


5/19/2014 2:00 Math-PT-Donuts MARCELINE R-V 
5/19/2014 2:00 Math-PT-Donuts-A MARCELINE R-V 
5/19/2014 2:00 SBAC-GO7-Math-N MARCELINE R-V 
5/19/2014 2:00 SBAC-GO7-Math-N MARCELINE R-V 
5/19/2014 2:00 ELA-PT-Uncommoi MARIES CO. R-I 
5/19/2014 2:00 HS-Math-PT-Great MARIES CO. R-I 
5/19/2014 2:00 Math-PT-Animal Jt MARIES CO. R-I 
5/19/2014 2:00 SBAC-GO4-ELA-No: MARIES CO. R-I 
5/19/2014 2:00 SBAC-GO4-Math-N MARIES CO. R-I 
5/19/2014 2:00 SBAC-HS-Math-No MARIES CO. R-I 
5/19/2014 2:00 ELA-PT-Archeologi MARIES CO. R-II 
5/19/2014 2:00 Math-PT-South Po MARIES CO. R-II 
5/19/2014 2:00 SBAC-GO8-ELA-Noi MARIES CO. R-II 
5/19/2014 2:00 SBAC-GO8-Math-N MARIES CO. R-II 
5/19/2014 2:00 ELA-PT-Renewable MARION CO. R-II 
5/19/2014 2:00 SBAC-GO8-ELA-Noi MARION CO. R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! MARION CO. R-II 
5/19/2014 2:00 SBAC-GO8-ELA-Noi MARION CO. R-II 
5/19/2014 2:00 SBAC-GO8-ELA-Noi MARION CO. R-II 
5/19/2014 2:00 SBAC-GO8-ELA-Noi MARION CO. R-II 
5/19/2014 2:00 ELA-PT-Technolog\ MARIONVILLE R-IX 
5/19/2014 2:00 HS-Math-PT-Great MARIONVILLE R-IX 
5/19/2014 2:00 Math-PT-Cell Phor MARIONVILLE R-IX 
5/19/2014 2:00 Math-PT-Cell Phor MARIONVILLE R-IX 
5/19/2014 2:00 Math-PT-Donuts MARIONVILLE R-IX 
5/19/2014 2:00 SBAC-GO7-ELA-No! MARIONVILLE R-IX 
5/19/2014 2:00 SBAC-GO7-Math-N MARIONVILLE R-IX 
5/19/2014 2:00 SBAC-GO7-Math-N MARIONVILLE R-IX 
5/19/2014 2:00 SBAC-GO7-Math-N MARIONVILLE R-IX 
5/19/2014 2:00 SBAC-HS-Math-No MARIONVILLE R-IX 
5/19/2014 2:00 Math-PT-Talent Sh MARQUAND-ZION 
5/19/2014 2:00 Math-PT-Talent Sh MARQUAND-ZION 
5/19/2014 2:00 SBAC-GO6-Math-N MARQUAND-ZION 
5/19/2014 2:00 SBAC-GO6-Math-N MARQUAND-ZION 
5/19/2014 2:00 ELA-PT-The Americ MARSHALL 
5/19/2014 2:00 SBAC-GO5-ELA-Noi MARSHALL 
5/19/2014 2:00 SBAC-GO5-ELA-Noi MARSHALL 
5/19/2014 2:00 SBAC-GO5-ELA-No!i MARSHALL 
5/19/2014 2:00 SBAC-GO5-ELA-No!i MARSHALL 
5/19/2014 2:00 ELA-PT-Technolog: MARSHFIELD R-I 
5/19/2014 2:00 Math-PT-Donuts MARSHFIELD R-I 
5/19/2014 2:00 Math-PT-South Po MARSHFIELD R-I 
5/19/2014 2:00 SBAC-GO7-ELA-No! MARSHFIELD R-I 


PRPRPPRPRP RPP PPP RP RPP PPP RPP RP PRP PRP RP RP RPP PRP PRP PRP PRP RP PRP PRP RP PR 


51 


FOoOOOOOOWAON OC O 


231 


51 


FOoOOOOOO WAN OC O 


231 


oO 


0.00% 
0.00% 
100.00% 
95.56% 
100.00% 
91.49% 
0.00% 
95.65% 
0.00% 
97.87% 
0.00% 
79.37% 
1.89% 
84.13% 
96.00% 
80.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
69.86% 
100.00% 
79.31% 
0.00% 
0.00% 
75.00% 
80.00% 
78.57% 
69.86% 
0.00% 
0.00% 
100.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
50.00% 
95.85% 
50.00% 


0.00% 
0.00% 
100.00% 
95.56% 
100.00% 
91.49% 
0.00% 
95.65% 
0.00% 
97.87% 
0.00% 
79.37% 
0.00% 
84.13% 
96.00% 
80.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
69.86% 
100.00% 
79.31% 
0.00% 
0.00% 
75.00% 
80.00% 
78.57% 
69.86% 
0.00% 
0.00% 
100.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
50.00% 
95.85% 
0.00% 


5/19/2014 2:00 SBAC-GO7-Math-N MARSHFIELD R-I 
5/19/2014 2:00 SBAC-GO7-Math-N MARSHFIELD R-I 
5/19/2014 2:00 SBAC-GO7-Math-N MARSHFIELD R-I 
5/19/2014 2:00 SBAC-GO7-Math-N MARSHFIELD R-I 
5/19/2014 2:00 SBAC-GO7-Math-N MARSHFIELD R-I 
5/19/2014 2:00 Math-PT-Donuts-A MCDONALD CO. R: 
5/19/2014 2:00 SBAC-GO6-Math-N MCDONALD CO. R: 
5/19/2014 2:00 SBAC-GO6-Math-N MCDONALD CO. R. 
5/19/2014 2:00 SBAC-GO6-Math-N MCDONALD CO. R: 
5/19/2014 2:00 SBAC-GO6-Math-N MCDONALD CO. R: 
5/19/2014 2:00 Math-PT-Camping- MEADOW HEIGHT 
5/19/2014 2:00 SBAC-GO7-Math-N MEADOW HEIGHT 
5/19/2014 2:00 SBAC-GO7-Math-N MEADOW HEIGHT 
5/19/2014 2:00 SBAC-GO7-Math-N MEADOW HEIGHT 
5/19/2014 2:00 SBAC-GO7-Math-N MEADOW HEIGHT 
5/19/2014 2:00 ELA-PT-Trees-A = MEHLVILLE R-IX 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MEHLVILLE R-IX 
5/19/2014 2:00 SBAC-GO4-ELA-No! MEHLVILLE R-IX 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MEHLVILLE R-IX 
5/19/2014 2:00 SBAC-GO4-ELA-No: MEHLVILLE R-IX 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MEHLVILLE R-IX 
5/19/2014 2:00 ELA-PT-Marine Ani MERAMEC VALLEY 
5/19/2014 2:00 ELA-PT-Technolog’s MERAMEC VALLEY 
5/19/2014 2:00 ELA-PT-Trees-A © MERAMEC VALLEY 
5/19/2014 2:00 Math-PT-Camping: MERAMEC VALLEY 
5/19/2014 2:00 Math-PT-Donuts MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO4-ELA-No! MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO04-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO04-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-G04-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-G04-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO5-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO5-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO5-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO5-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO7-ELA-Noi MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO7-Math-N MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO7-Math-N MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO7-Math-N MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO7-Math-N MERAMEC VALLEY 
5/19/2014 2:00 SBAC-GO7-Math-N MERAMEC VALLEY 
5/19/2014 2:00 ELA-PT-Technolog\ MEXICO 59 
5/19/2014 2:00 HS-ELA-PT-A New MEXICO 59 


PRP rPrPRPRPRP PRP PRP RPP RPP PRP RP RP RP PRP RPP RP RP RP RP RP RP RP RPP RP RP RP RP RP RP RP RP BR 


100.00% 
100.00% 
100.00% 
90.48% 
95.16% 
89.74% 
100.00% 
100.00% 
100.00% 
88.89% 
90.32% 
100.00% 
100.00% 
100.00% 
100.00% 
92.94% 
82.35% 
94.12% 
100.00% 
88.24% 
88.24% 
90.28% 
0.00% 
93.22% 
92.99% 
100.00% 
100.00% 
83.33% 
91.67% 
91.67% 
81.82% 
100.00% 
83.33% 
100.00% 
94.44% 
0.00% 
100.00% 
100.00% 
100.00% 
91.23% 
91.07% 
97.37% 
0.00% 


100.00% 
100.00% 
100.00% 
90.48% 
95.16% 
89.74% 
100.00% 
100.00% 
100.00% 
88.89% 
90.32% 
100.00% 
100.00% 
100.00% 
100.00% 
92.94% 
76.47% 
94.12% 
100.00% 
88.24% 
88.24% 
90.28% 
0.00% 
93.22% 
91.59% 
100.00% 
100.00% 
83.33% 
91.67% 
91.67% 
81.82% 
88.89% 
83.33% 
100.00% 
94.44% 
0.00% 
93.88% 
92.31% 
100.00% 
89.47% 
87.50% 
96.84% 
0.00% 


5/19/2014 2:00 HS-Math-PT-Great MEXICO 59 
5/19/2014 2:00 Math-PT-Donuts MEXICO 59 
5/19/2014 2:00 Math-PT-Making S MEXICO 59 
5/19/2014 2:00 Math-PT-Making S MEXICO 59 
5/19/2014 2:00 SBAC-G03-Math-N MEXICO 59 
5/19/2014 2:00 SBAC-G03-Math-N MEXICO 59 
5/19/2014 2:00 SBAC-G03-Math-N MEXICO 59 
5/19/2014 2:00 SBAC-G07-ELA-No! MEXICO 59 
5/19/2014 2:00 SBAC-G07-Math-N MEXICO 59 
5/19/2014 2:00 SBAC-HS-ELA-NonI MEXICO 59 
5/19/2014 2:00 SBAC-HS-Math-No MEXICO 59 
5/19/2014 2:00 ELA-PT-Marine Ani MILLER CO. R-III 
5/19/2014 2:00 HS-Math-PT-Great MILLER CO. R-III 
5/19/2014 2:00 SBAC-G04-ELA-Noi MILLER CO. R-III 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MILLER CO. R-III 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MILLER CO. R-II 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MILLER CO. R-II 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MILLER CO. R-III 
5/19/2014 2:00 SBAC-HS-Math-No MILLER CO. R-III 
5/19/2014 2:00 HS-Math-PT-Great MO SCHOOL FOR 1 
5/19/2014 2:00 SBAC-HS-Math-No MO SCHOOL FOR 1 
5/19/2014 2:00 Math-PT-Baseball MOBERLY 
5/19/2014 2:00 Math-PT-Baseball- MOBERLY 
5/19/2014 2:00 Math-PT-Turtle Ha MOBERLY 
5/19/2014 2:00 Math-PT-Turtle Ha MOBERLY 
5/19/2014 2:00 SBAC-GO5-Math-N MOBERLY 
5/19/2014 2:00 SBAC-G05-Math-N MOBERLY 
5/19/2014 2:00 SBAC-G05-Math-N MOBERLY 
5/19/2014 2:00 SBAC-G05-Math-N MOBERLY 
5/19/2014 2:00 SBAC-G05-Math-N MOBERLY 
5/19/2014 2:00 SBAC-GO8-Math-N MOBERLY 
5/19/2014 2:00 SBAC-GO8-Math-N MOBERLY 
5/19/2014 2:00 SBAC-GO8-Math-N MOBERLY 
5/19/2014 2:00 SBAC-GO8-Math-N MOBERLY 
5/19/2014 2:00 ELA-PT-Growth an MONETT R-I 
5/19/2014 2:00 Math-PT-Talent Sh MONETT R-I 
5/19/2014 2:00 Math-PT-Talent Sh MONETT R-I 
5/19/2014 2:00 SBAC-GO6-ELA-No! MONETT R-I 
5/19/2014 2:00 SBAC-GO6-Math-N MONETT R-I 
5/19/2014 2:00 SBAC-GO6-Math-N MONETT R-I 
5/19/2014 2:00 ELA-PT-Uncommo! MONITEAU CO. R-| 
5/19/2014 2:00 SBAC-GO4-ELA-No!i MONITEAU CO. R-| 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MONITEAU CO. R-! 


PRP PPP RPP RPP PRP RPP PRP RP PRP RPP RP RP RPP RP RRP RP RP RP RP RP RPP RPP RP RP RP RP RP PR 


86.29% 
0.00% 
97.96% 
94.12% 
75.00% 
100.00% 
97.96% 
98.42% 
0.00% 
0.00% 
0.00% 
100.00% 
95.24% 
100.00% 
66.67% 
100.00% 
100.00% 
100.00% 
95.24% 
0.00% 
0.00% 
100.00% 
76.61% 
100.00% 
91.98% 
95.12% 
95.00% 
100.00% 
90.24% 
90.00% 
100.00% 
73.91% 
75.00% 
85.29% 
0.00% 
96.51% 
0.00% 
13.86% 
96.51% 
0.00% 
93.48% 
94.74% 
88.89% 


86.29% 
0.00% 
97.96% 
94.12% 
75.00% 
100.00% 
97.96% 
98.42% 
0.00% 
0.00% 
0.00% 
100.00% 
95.24% 
100.00% 
66.67% 
100.00% 
100.00% 
100.00% 
95.24% 
0.00% 
0.00% 
100.00% 
76.61% 
100.00% 
91.98% 
92.68% 
87.50% 
100.00% 
90.24% 
90.00% 
100.00% 
73.91% 
75.00% 
85.29% 
0.00% 
96.51% 
0.00% 
0.00% 
96.51% 
0.00% 
93.48% 
94.74% 
88.89% 


5/19/2014 2:00 SBAC-GO4-ELA-Noi MONITEAU CO. R-! 
5/19/2014 2:00 SBAC-GO4-ELA-Noi MONITEAU CO. R-! 
5/19/2014 2:00 SBAC-G04-ELA-Noi MONITEAU CO. R-! 
5/19/2014 2:00 ELA-PT-Technolog’s MONROE CITY R-I 
5/19/2014 2:00 SBAC-GO6-ELA-No!i MONROE CITY R-I 
5/19/2014 2:00 SBAC-GO6-ELA-Noi MONROE CITY R-I 
5/19/2014 2:00 SBAC-GO6-ELA-No!i MONROE CITY R-I 
5/19/2014 2:00 SBAC-GO6-ELA-Noi MONROE CITY R-I 
5/19/2014 2:00 HS-Math-PT-Great MONTROSE R-XIV 
5/19/2014 2:00 SBAC-HS-Math-No MONTROSE R-XIV 
5/19/2014 2:00 Math-PT-Sandbox- MORGAN CO. R-II 
5/19/2014 2:00 SBAC-GO6-Math-N MORGAN CO. R-II 
5/19/2014 2:00 SBAC-GO6-Math-N MORGAN CO. R-II 
5/19/2014 2:00 HS-Math-PT-Great NEELYVILLE R-IV 
5/19/2014 2:00 Math-PT-School Li NEELYVILLE R-IV 
5/19/2014 2:00 SBAC-GO3-Math-N NEELYVILLE R-IV 
5/19/2014 2:00 SBAC-HS-Math-No NEELYVILLE R-IV 


5/19/2014 2:00 ELA-PT-Marine Ani NELL HOLCOMB R- 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NELL HOLCOMB R- 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NELL HOLCOMB R- 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NELL HOLCOMB R- 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NELL HOLCOMB R- 
5/19/2014 2:00 SBAC-GO4-ELA-No! NELL HOLCOMB R- 


5/19/2014 2:00 ELA-PT-Renewable NEOSHO R-V 
5/19/2014 2:00 ELA-PT-Uncommo! NEOSHO R-V 
5/19/2014 2:00 HS-Math-PT-Great NEOSHO R-V 
5/19/2014 2:00 Math-PT-Animal Jt NEOSHO R-V 
5/19/2014 2:00 Math-PT-Animal Jt NEOSHO R-V 
5/19/2014 2:00 SBAC-GO4-ELA-No! NEOSHO R-V 
5/19/2014 2:00 SBAC-GO4-Math-N NEOSHO R-V 
5/19/2014 2:00 SBAC-G04-Math-N NEOSHO R-V 
5/19/2014 2:00 SBAC-GO04-Math-N NEOSHO R-V 
5/19/2014 2:00 SBAC-GO04-Math-N NEOSHO R-V 
5/19/2014 2:00 SBAC-GO4-Math-N NEOSHO R-V 
5/19/2014 2:00 SBAC-GO4-Math-N NEOSHO R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi NEOSHO R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi NEOSHO R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi NEOSHO R-V 
5/19/2014 2:00 SBAC-GO8-ELA-No!i NEOSHO R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi NEOSHO R-V 
5/19/2014 2:00 SBAC-HS-Math-No NEOSHO R-V 
5/19/2014 2:00 HS-ELA-PT-A New NEVADA R-V 
5/19/2014 2:00 HS-ELA-PT-A New NEVADA R-V 


PRPrPP RPP RPRPPPRP PPP RP RPP RPP RP RPP BPP PRP PRP PRP RPP RP PRP RPP RPP RP PP PR 


100.00% 
94.44% 
100.00% 
14.00% 
100.00% 
83.33% 
100.00% 
83.33% 
0.00% 
0.00% 
94.64% 
94.64% 
94.64% 
78.57% 
0.00% 
0.00% 
78.57% 
96.30% 
100.00% 
80.00% 
100.00% 
100.00% 
100.00% 
87.72% 
0.00% 
86.49% 
0.00% 
0.00% 
0.00% 
100.00% 
90.91% 
100.00% 
86.36% 
91.30% 
95.45% 
91.30% 
100.00% 
86.96% 
100.00% 
90.91% 
86.49% 
100.00% 
0.00% 


100.00% 
94.44% 
100.00% 
8.00% 
69.23% 
50.00% 
53.85% 
58.33% 
0.00% 
0.00% 
94.64% 
94.64% 
94.64% 
78.57% 
0.00% 
0.00% 
78.57% 
96.30% 
100.00% 
80.00% 
100.00% 
100.00% 
100.00% 
85.09% 
0.00% 
86.49% 
0.00% 
0.00% 
0.00% 
100.00% 
90.91% 
100.00% 
86.36% 
91.30% 
95.45% 
86.96% 
91.30% 
78.26% 
95.65% 
86.36% 
86.49% 
0.00% 
0.00% 


5/19/2014 2:00 Math-PT-Commun NEVADA R-V 
5/19/2014 2:00 Math-PT-Commun NEVADA R-V 
5/19/2014 2:00 SBAC-GO5-Math-N NEVADA R-V 
5/19/2014 2:00 SBAC-G05-Math-N NEVADA R-V 
5/19/2014 2:00 SBAC-G05-Math-N NEVADA R-V 
5/19/2014 2:00 SBAC-HS-ELA-NonI NEVADA R-V 
5/19/2014 2:00 SBAC-HS-ELA-NonI NEVADA R-V 
5/19/2014 2:00 SBAC-HS-ELA-NonI NEVADA R-V 
5/19/2014 2:00 ELA-PT-Technolog: NEW MADRID CO. 
5/19/2014 2:00 Math-PT-Turtle Ha NEW MADRID CO. 
5/19/2014 2:00 SBAC-G05-Math-N NEW MADRID CO. 
5/19/2014 2:00 SBAC-G0O7-ELA-No| NEW MADRID CO. 
5/19/2014 2:00 ELA-PT-Animals W NIANGUA R-V 
5/19/2014 2:00 ELA-PT-Animals W NIANGUA R-V 
5/19/2014 2:00 HS-ELA-PT-A New NIANGUA R-V 
5/19/2014 2:00 HS-ELA-PT-A New NIANGUA R-V 
5/19/2014 2:00 SBAC-G03-ELA-No! NIANGUA R-V 
5/19/2014 2:00 SBAC-G03-ELA-No! NIANGUA R-V 
5/19/2014 2:00 SBAC-G03-ELA-No! NIANGUA R-V 
5/19/2014 2:00 SBAC-HS-ELA-Non! NIANGUA R-V 
5/19/2014 2:00 SBAC-HS-ELA-NonI NIANGUA R-V 
5/19/2014 2:00 SBAC-HS-ELA-Non! NIANGUA R-V 
5/19/2014 2:00 ELA-PT-Importanct NORBORNE R-VIII 
5/19/2014 2:00 SBAC-GO7-ELA-No! NORBORNE R-VIII 
5/19/2014 2:00 SBAC-GO7-ELA-No! NORBORNE R-VIII 
5/19/2014 2:00 SBAC-G0O7-ELA-No! NORBORNE R-VIII 
5/19/2014 2:00 SBAC-G0O7-ELA-No! NORBORNE R-VIII 
5/19/2014 2:00 ELA-PT-Uncommo! NORTH CALLAWA\ 
5/19/2014 2:00 ELA-PT-Uncommo! NORTH CALLAWA\ 
5/19/2014 2:00 HS-ELA-PT-A New NORTH CALLAWA\ 
5/19/2014 2:00 HS-ELA-PT-A New NORTH CALLAWA\ 
5/19/2014 2:00 Math-PT-Order Fo NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-G03-Math-N NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-G04-ELA-No| NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-G04-ELA-No| NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-G04-ELA-No| NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-G04-ELA-No| NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-G04-ELA-No! NORTH CALLAWAY 
5/19/2014 2:00 SBAC-G08-ELA-No! NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-G08-ELA-No! NORTH CALLAWAY 
5/19/2014 2:00 SBAC-G08-ELA-No! NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-G08-ELA-No! NORTH CALLAWA\ 


PRPRPPRPRPPPPRPRP PPP RPP RP RP RP RP RP RP RPP RP RP RP RP RP RP RP RP RP RP RP RP RP RP RP RPP RP PR 


PPP rP)N PB 


122 


RPrRFNNNN PN 


NNN WO W 
WWON Ff 


MIN OwWwonan N PN N 


PPP PP PR 


FPrRPFNNNN OO O 


33 


22 


MN aAWOaAN NN ON N 


PPP PP PR 


FPFrRPFNNNN OO O 


32 


22 


MN OwWwwown OO fA 


100.00% 
50.00% 
100.00% 
100.00% 
100.00% 
100.00% 
75.41% 
75.61% 
0.00% 
0.00% 
0.00% 
0.00% 
50.00% 
95.65% 
81.25% 
0.00% 
87.50% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
97.06% 
81.52% 
100.00% 
0.00% 
95.65% 
100.00% 
100.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 


100.00% 
50.00% 
100.00% 
100.00% 
100.00% 
100.00% 
61.48% 
60.98% 
0.00% 
0.00% 
0.00% 
0.00% 
50.00% 
95.65% 
81.25% 
0.00% 
75.00% 
100.00% 
88.89% 
100.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
94.12% 
81.52% 
100.00% 
0.00% 
95.65% 
85.71% 
57.14% 
0.00% 
71.43% 
100.00% 
50.00% 
100.00% 
100.00% 
100.00% 
100.00% 


5/19/2014 2:00 SBAC-GO8-ELA-Noi NORTH CALLAWA\ 
5/19/2014 2:00 SBAC-HS-ELA-NonI NORTH CALLAWA\ 
5/19/2014 2:00 HS-Math-PT-Great NORTH HARRISON 
5/19/2014 2:00 SBAC-HS-Math-No NORTH HARRISON 
5/19/2014 2:00 HS-Math-PT-Great NORTH MERCER C 
5/19/2014 2:00 SBAC-HS-Math-No NORTH MERCER C 
5/19/2014 2:00 Math-PT-Sandbox- NORTH NODAWAY 
5/19/2014 2:00 SBAC-GO6-Math-N NORTH NODAWAY 
5/19/2014 2:00 SBAC-GO6-Math-N NORTH NODAWAY 
5/19/2014 2:00 SBAC-GO6-Math-N NORTH NODAWAY 
5/19/2014 2:00 SBAC-GO6-Math-N NORTH NODAWAY 
5/19/2014 2:00 Math-PT-South Po NORTH PLATTE CC 
5/19/2014 2:00 SBAC-GO8-Math-N NORTH PLATTE CO 
5/19/2014 2:00 SBAC-GO8-Math-N NORTH PLATTE CO 
5/19/2014 2:00 SBAC-GO8-Math-N NORTH PLATTE CO 
5/19/2014 2:00 SBAC-GO8-Math-N NORTH PLATTE CO 
5/19/2014 2:00 SBAC-GO8-Math-N NORTH PLATTE CO 
5/19/2014 2:00 Math-PT-Donuts-A NORTHEAST VERN 
5/19/2014 2:00 SBAC-GO7-Math-N NORTHEAST VERN 
5/19/2014 2:00 SBAC-GO7-Math-N NORTHEAST VERN 
5/19/2014 2:00 ELA-PT-Deserts | NORTHWEST R-I 
5/19/2014 2:00 ELA-PT-Deserts-A NORTHWEST R-I 
5/19/2014 2:00 ELA-PT-Trees NORTHWEST R-I 
5/19/2014 2:00 ELA-PT-Trees-A | NORTHWEST R-I 
5/19/2014 2:00 HS-ELA-PT-A New NORTHWEST R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NORTHWEST R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NORTHWEST R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NORTHWEST R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NORTHWEST R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NORTHWEST R-I 
5/19/2014 2:00 SBAC-GO4-ELA-Noi NORTHWEST R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonI NORTHWEST R-I 
5/19/2014 2:00 ELA-PT-Trees-A OAK GROVE R-VI 
5/19/2014 2:00 SBAC-GO3-ELA-Noi OAK GROVE R-VI 
5/19/2014 2:00 SBAC-GO3-ELA-No!i OAK GROVE R-VI 
5/19/2014 2:00 SBAC-GO3-ELA-No!i OAK GROVE R-VI 
5/19/2014 2:00 HS-Math-PT-Great OAK RIDGE R-VI 
5/19/2014 2:00 SBAC-HS-Math-No OAK RIDGE R-VI 
5/19/2014 2:00 HS-ELA-PT-A New ODESSA R-VII 
5/19/2014 2:00 Math-PT-Talent Sh ODESSA R-VII 
5/19/2014 2:00 Math-PT-Talent Sh ODESSA R-VII 
5/19/2014 2:00 SBAC-GO6-Math-N ODESSA R-VII 
5/19/2014 2:00 SBAC-GO6-Math-N ODESSA R-VII 


PRPPRPrPRPRP PPP PPP RP RPP PRP RP RP RPP RP RPP RP RPP RP RPP RP RRP RP RP RP RP RPP RP RP PB 


100.00% 
81.52% 
86.36% 
86.36% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 
83.33% 
85.71% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 
92.86% 
66.67% 
97.53% 
86.19% 
93.33% 

100.00% 

100.00% 
96.67% 

100.00% 
93.55% 
86.58% 

0.00% 
0.00% 
0.00% 
0.00% 
92.31% 
96.15% 
0.00% 

100.00% 
95.86% 

100.00% 

100.00% 


100.00% 
52.17% 
86.36% 
86.36% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 
83.33% 
85.71% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 
90.00% 
66.67% 
97.53% 
85.80% 
93.33% 

100.00% 

100.00% 
96.67% 
96.67% 
93.55% 
86.19% 

0.00% 
0.00% 
0.00% 
0.00% 
92.31% 
96.15% 
0.00% 

100.00% 
95.86% 

100.00% 

100.00% 


5/19/2014 2:00 SBAC-GO6-Math-N ODESSA R-VII 
5/19/2014 2:00 SBAC-GO6-Math-N ODESSA R-VII 
5/19/2014 2:00 SBAC-GO6-Math-N ODESSA R-VII 
5/19/2014 2:00 SBAC-HS-ELA-Nonl ODESSA R-VII 
5/19/2014 2:00 SBAC-HS-ELA-Nonl ODESSA R-VII 
5/19/2014 2:00 ELA-PT-Land Form ORCHARD FARM R 
5/19/2014 2:00 SBAC-G03-ELA-No!| ORCHARD FARM R 
5/19/2014 2:00 HS-Math-PT-Great OSAGE CO. R-I 
5/19/2014 2:00 SBAC-HS-Math-No OSAGE CO. R-I 
5/19/2014 2:00 ELA-PT-Growth an OSAGE CO. R-II 
5/19/2014 2:00 HS-Math-PT-Great OSAGE CO. R-II 
5/19/2014 2:00 Math-PT-Talent Sh OSAGE CO. R-II 
5/19/2014 2:00 SBAC-GO6-ELA-No! OSAGE CO. R-II 
5/19/2014 2:00 SBAC-GO6-Math-N OSAGE CO. R-II 
5/19/2014 2:00 SBAC-HS-Math-No OSAGE CO. R-II 
5/19/2014 2:00 HS-Math-PT-Great OSCEOLA 
5/19/2014 2:00 SBAC-HS-Math-No OSCEOLA 
5/19/2014 2:00 ELA-PT-Renewable OTTERVILLE R-VI 
5/19/2014 2:00 Math-PT-Order Fo OTTERVILLE R-VI 
5/19/2014 2:00 SBAC-G03-Math-N OTTERVILLE R-VI 
5/19/2014 2:00 SBAC-G03-Math-N OTTERVILLE R-VI 
5/19/2014 2:00 SBAC-G03-Math-N OTTERVILLE R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-No! OTTERVILLE R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-No! OTTERVILLE R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-No! OTTERVILLE R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-No! OTTERVILLE R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-No! OTTERVILLE R-VI 
5/19/2014 2:00 HS-ELA-PT-A New PARIS R-II 
5/19/2014 2:00 HS-Math-PT-Great PARIS R-II 
5/19/2014 2:00 SBAC-G08-ELA-No! PARIS R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! PARIS R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! PARIS R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! PARIS R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! PARIS R-II 
5/19/2014 2:00 SBAC-HS-Math-No PARIS R-II 
5/19/2014 2:00 ELA-PT-Growth an PARK HILL 
5/19/2014 2:00 SBAC-GO5-ELA-Noi PARK HILL 
5/19/2014 2:00 SBAC-GO5-ELA-Noi PARK HILL 
5/19/2014 2:00 SBAC-GO5-ELA-No! PARK HILL 
5/19/2014 2:00 SBAC-GO5-ELA-Noi PARK HILL 
5/19/2014 2:00 Math-PT-Order Fo PARKWAY C-2 
5/19/2014 2:00 Math-PT-Order Fo PARKWAY C-2 
5/19/2014 2:00 SBAC-G03-Math-N PARKWAY C-2 


PRPrPP RPP RPP RPP PRP RPP PRP RPP RPP PRP RPRP PRP RPP HPP PRP PRP PRP RP PRP RPP PP 


21 


W N 
wanauwmrtkt asp BoD WO N 


anon mo 


45 


Oe OO Oe ee eee eee) 


W N 
NO 


anon mn 


45 


Oe OO On ee eee eee) 


W N 
N @®D 


anon mo 


100.00% 
95.38% 
93.94% 

0.00% 
0.00% 
90.08% 
90.84% 
100.00% 
0.00% 
0.00% 
100.00% 
100.00% 
0.00% 
100.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 

100.00% 
96.97% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 
96.88% 
78.46% 
93.75% 

100.00% 
87.50% 

100.00% 
33.33% 
92.00% 
96.00% 


100.00% 
95.38% 
93.94% 

0.00% 
0.00% 
86.26% 
89.31% 
100.00% 
0.00% 
0.00% 
100.00% 
100.00% 
0.00% 
100.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 

100.00% 
96.97% 

100.00% 

100.00% 

100.00% 

100.00% 

100.00% 
96.88% 
61.54% 
81.25% 
75.00% 
87.50% 
88.24% 
33.33% 
88.00% 
92.00% 


5/19/2014 2:00 SBAC-GO3-Math-N PARKWAY C-2 
5/19/2014 2:00 SBAC-GO3-Math-N PARKWAY C-2 
5/19/2014 2:00 SBAC-GO3-Math-N PARKWAY C-2 
5/19/2014 2:00 HS-ELA-PT-A New PATTONSBURG R-I 
5/19/2014 2:00 SBAC-HS-ELA-NonI PATTONSBURG R-I 
5/19/2014 2:00 Math-PT-Camping PATTONVILLE R-III 
5/19/2014 2:00 Math-PT-Camping: PATTONVILLE R-III 
5/19/2014 2:00 Math-PT-Commun PATTONVILLE R-III 
5/19/2014 2:00 Math-PT-Commun PATTONVILLE R-III 
5/19/2014 2:00 Math-PT-Sandbox PATTONVILLE R-III 
5/19/2014 2:00 Math-PT-Sandbox- PATTONVILLE R-III 
5/19/2014 2:00 Math-PT-Turtle Ha PATTONVILLE R-III 
5/19/2014 2:00 Math-PT-Turtle Ha PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO4-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO4-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO4-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO4-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO04-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO4-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO5-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO5-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO5-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO5-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO5-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO6-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO6-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO6-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO7-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO7-Math-N PATTONVILLE R-III 
5/19/2014 2:00 SBAC-GO7-Math-N PATTONVILLE R-III 
5/19/2014 2:00 ELA-PT-Space Expl PERRY CO. 32 
5/19/2014 2:00 SBAC-GO8-ELA-Noi PERRY CO. 32 
5/19/2014 2:00 SBAC-GO8-ELA-No! PERRY CO. 32 
5/19/2014 2:00 SBAC-GO8-ELA-No! PERRY CO. 32 
5/19/2014 2:00 SBAC-GO8-ELA-No! PERRY CO. 32 
5/19/2014 2:00 SBAC-GO8-ELA-No! PERRY CO. 32 
5/19/2014 2:00 ELA-PT-Renewable PETTIS CO. R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi PETTIS CO. R-V 
5/19/2014 2:00 SBAC-GO8-ELA-No! PETTIS CO. R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi PETTIS CO. R-V 
5/19/2014 2:00 SBAC-GO8-ELA-No! PETTIS CO. R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi PETTIS CO. R-V 
5/19/2014 2:00 Math-PT-Donuts-A PETTIS CO. R-XII 


PRPRPRP PPP PPP P PRP RPP RPP RPP PRP PRP RPP RPP RPP RP RP RP RP RP PRP RPP RP RP RP RP PR 


weep HB HSA 


14 


16 


Oe Oe Oe On eee eee eee) 


1 


(oe) 


wp BBW 


13 


12 


Oe Oe Oe Oe ee eee eee) 


1 


oe) 


wp BBW 


12 


96.00% 
84.00% 
33.33% 
100.00% 
100.00% 
0.00% 
0.00% 
100.00% 
43.22% 
0.00% 
0.00% 
100.00% 
86.76% 
30.43% 
47.83% 
100.00% 
56.00% 
39.13% 
45.83% 
94.12% 
88.24% 
0.00% 
88.24% 
94.12% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
94.74% 
75.00% 
100.00% 
100.00% 
100.00% 
100.00% 
92.86% 


80.00% 
84.00% 
33.33% 
100.00% 
100.00% 
0.00% 
0.00% 
100.00% 
42.37% 
0.00% 
0.00% 
100.00% 
82.35% 
30.43% 
43.48% 
100.00% 
52.00% 
39.13% 
41.67% 
88.24% 
64.71% 
0.00% 
76.47% 
70.59% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
94.74% 
75.00% 
100.00% 
100.00% 
100.00% 
100.00% 
85.71% 


5/19/2014 2:00 SBAC-GO7-Math-N PETTIS CO. R-XII 
5/19/2014 2:00 SBAC-GO7-Math-N PETTIS CO. R-XII 
5/19/2014 2:00 SBAC-GO7-Math-N PETTIS CO. R-XII 
5/19/2014 2:00 SBAC-GO7-Math-N PETTIS CO. R-XII 
5/19/2014 2:00 HS-Math-PT-Great PIERCE CITY R-VI 
5/19/2014 2:00 SBAC-HS-Math-No PIERCE CITY R-VI 
Math-PT-Donuts PIKE CO. R-III 
Math-PT-Donuts-A PIKE CO. R-III 
5/19/2014 2:00 SBAC-GO7-Math-N PIKE CO. R-II 
5/19/2014 2:00 SBAC-GO7-Math-N PIKE CO. R-II 
HS-Math-PT-Great PILOT GROVE C-4 
5/19/2014 2:00 SBAC-HS-Math-No PILOT GROVE C-4 


5/19/2014 2:00 
5/19/2014 2:00 


5/19/2014 2:00 


5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 


ELA-PT-Importanc: PLATTE CO. 
ELA-PT-Importanci PLATTE CO. 
ELA-PT-Technolog\ PLATTE CO. 
Math-PT-Donuts PLATTE CO. 
SBAC-GO7-ELA-No! PLATTE CO. 
SBAC-GO7-ELA-No! PLATTE CO. 
SBAC-GO7-ELA-No! PLATTE CO. 
SBAC-GO7-ELA-Noi PLATTE CO. 
SBAC-G0O7-ELA-No! PLATTE CO. 
SBAC-GO7-Math-N PLATTE CO. 


R-IIl 
R-Ill 
R-Ill 
R-Ill 
R-Ill 
R-Ill 
R-Ill 
R-Ill 
R-Ill 
R-IIl 


HS-Math-PT-Great PLEASANT HILL R-I 
SBAC-HS-Math-No PLEASANT HILL R-I 
HS-Math-PT-Great PLEASANT HOPE R 
SBAC-HS-Math-No PLEASANT HOPE R 
HS-ELA-PT-A New PRAIRIE HOME R-\ 
SBAC-HS-ELA-Nonl PRAIRIE HOME R-\ 
Math-PT-Making S PRINCETON R-V 
SBAC-G03-Math-N PRINCETON R-V 
SBAC-G03-Math-N PRINCETON R-V 
SBAC-G03-Math-N PRINCETON R-V 
HS-Math-PT-Great PUTNAM CO. R-I 
SBAC-HS-Math-No PUTNAM CO. R-I 
HS-Math-PT-Great PUXICO R-VIII 
SBAC-HS-Math-No PUXICO R-VIII 
HS-ELA-PT-A New REEDS SPRING R-I\ 
SBAC-HS-ELA-Nonl REEDS SPRING R-I\ 
ELA-PT-Archeologi RICHARDS R-V 
ELA-PT-Archeologi RICHARDS R-V 
5/19/2014 2:00 SBAC-G08-ELA-No! RICHARDS R-V 
5/19/2014 2:00 SBAC-G08-ELA-No: RICHARDS R-V 
5/19/2014 2:00 HS-ELA-PT-A New RICHLAND R-I 


PRPRPRPRPPPPP RPP PPP RPP RP PRP RPP RP RPP RPP RPP RPP RPP PRP RP RP RP RPP PP PP 


OOOO O O FS BN W 


pay 
1@>) 


OOOO O O FP BSN W 


pay 
1@>) 


oOo OW WO WO 


39 
48 
115 
127 


oO OO O&O 


100.00% 
100.00% 
80.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
88.24% 
94.12% 
100.00% 
96.39% 
100.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
95.24% 
0.00% 
0.00% 
0.00% 
94.23% 
92.31% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
72.22% 
92.59% 
81.76% 
87.84% 
0.00% 
0.00% 
40.48% 
100.00% 
0.00% 


100.00% 
100.00% 
80.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
88.24% 
94.12% 
100.00% 
96.39% 
100.00% 
0.00% 
90.00% 
100.00% 
100.00% 
95.24% 
95.24% 
0.00% 
0.00% 
0.00% 
94.23% 
92.31% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
72.22% 
88.89% 
77.10% 
85.81% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 


5/19/2014 2:00 HS-Math-PT-Great RICHLAND R-I 
5/19/2014 2:00 SBAC-HS-ELA-Nonl RICHLAND R-I 
5/19/2014 2:00 SBAC-HS-Math-No RICHLAND R-I 
5/19/2014 2:00 ELA-PT-Uncommoi! RICHMOND R-XVI 
5/19/2014 2:00 ELA-PT-Uncommoi! RICHMOND R-XVI 
5/19/2014 2:00 Math-PT-Animal Jt RICHMOND R-XVI 
5/19/2014 2:00 Math-PT-Animal Jt RICHMOND R-XVI 
5/19/2014 2:00 SBAC-GO4-ELA-Noi RICHMOND R-XVI 
5/19/2014 2:00 SBAC-GO4-ELA-Noi RICHMOND R-XVI 
5/19/2014 2:00 SBAC-GO4-ELA-Noi RICHMOND R-XVI 
5/19/2014 2:00 SBAC-GO4-ELA-Noi RICHMOND R-XVI 
5/19/2014 2:00 SBAC-GO4-Math-N RICHMOND R-XVI 
5/19/2014 2:00 SBAC-GO04-Math-N RICHMOND R-XVI 
5/19/2014 2:00 SBAC-GO4-Math-N RICHMOND R-XVI 
5/19/2014 2:00 SBAC-GO4-Math-N RICHMOND R-XVI 
5/19/2014 2:00 ELA-PT-Animals W RITENOUR 
5/19/2014 2:00 SBAC-GO3-ELA-Noi RITENOUR 
5/19/2014 2:00 SBAC-GO3-ELA-Noi RITENOUR 
5/19/2014 2:00 SBAC-GO3-ELA-Noi RITENOUR 
5/19/2014 2:00 Math-PT-Order Fo ROLLA 31 
5/19/2014 2:00 Math-PT-Order Fo ROLLA 31 
5/19/2014 2:00 SBAC-GO3-Math-N ROLLA 31 
5/19/2014 2:00 SBAC-GO3-Math-N ROLLA 31 
5/19/2014 2:00 SBAC-GO3-Math-N ROLLA 31 
5/19/2014 2:00 SBAC-GO3-Math-N ROLLA 31 
5/19/2014 2:00 ELA-PT-Aztec Empi SALISBURY R-IV 
5/19/2014 2:00 HS-Math-PT-Great SALISBURY R-IV 
5/19/2014 2:00 SBAC-GO6-ELA-No! SALISBURY R-IV 
5/19/2014 2:00 SBAC-HS-Math-No SALISBURY R-IV 
5/19/2014 2:00 HS-Math-PT-Great SARCOXIE R-II 
5/19/2014 2:00 SBAC-HS-Math-No SARCOXIE R-II 
5/19/2014 2:00 HS-ELA-PT-A New SAVANNAH R-III 
5/19/2014 2:00 Math-PT-Camping:- SAVANNAH R-III 
5/19/2014 2:00 SBAC-GO8-Math-N SAVANNAH R-III 
5/19/2014 2:00 SBAC-GO8-Math-N SAVANNAH R-III 
5/19/2014 2:00 SBAC-GO8-Math-N SAVANNAH R-III 
5/19/2014 2:00 SBAC-HS-ELA-NonI SAVANNAH R-III 
5/19/2014 2:00 SBAC-HS-ELA-NonI SAVANNAH R-III 
5/19/2014 2:00 HS-Math-PT-Great SCHUYLER CO. R-I 
5/19/2014 2:00 SBAC-HS-Math-No SCHUYLER CO. R-I 
5/19/2014 2:00 Math-PT-Turtle Ha SCOTLAND CO. R-I 
5/19/2014 2:00 Math-PT-Turtle Ha SCOTLAND CO. R-I 
5/19/2014 2:00 SBAC-GO5-Math-N SCOTLAND CO. R-I 


PRP P PPP PP PPP PRP PRP RP PRP PRP RP PPP PRP RPP RP RP PRP RP RP RP RP RPP RP PP Pe 


— 
Wm © 


Oo Oe ee en eee ee eee) 


62 


— 
wm © 


OOOO OoOOoOlUlUOUlUCOUCUCUOUCOUCUCOlCUO 


61 


100.00% 
0.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
93.94% 
95.45% 
95.65% 
100.00% 
100.00% 
91.26% 
94.29% 
100.00% 
91.18% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
69.23% 
69.23% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
78.26% 
80.43% 
100.00% 
100.00% 
100.00% 


100.00% 
0.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
92.42% 
86.36% 
91.30% 
85.71% 
100.00% 
91.26% 
94.29% 
97.06% 
88.24% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
69.23% 
69.23% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
78.26% 
80.43% 
100.00% 
100.00% 
100.00% 


5/19/2014 2:00 SBAC-GO5-Math-N SCOTLAND CO. R-I 
5/19/2014 2:00 SBAC-GO5-Math-N SCOTLAND CO. R-| 
5/19/2014 2:00 SBAC-GO5-Math-N SCOTLAND CO. R-I 
5/19/2014 2:00 SBAC-GO5-Math-N SCOTLAND CO. R-I 
5/19/2014 2:00 HS-ELA-PT-A New SCOTT CO. R-IV 
5/19/2014 2:00 SBAC-HS-ELA-Nonl SCOTT CO. R-IV 
5/19/2014 2:00 SBAC-HS-ELA-Nonl SCOTT CO. R-IV 
5/19/2014 2:00 HS-Math-PT-Great SENATH-HORNERS 
5/19/2014 2:00 SBAC-HS-Math-No SENATH-HORNERS 
5/19/2014 2:00 Math-PT-South Po SHAWNEE R-III 
5/19/2014 2:00 SBAC-GO8-Math-N SHAWNEE R-III 
5/19/2014 2:00 ELA-PT-Archeologi SHELBY CO. R-IV 
5/19/2014 2:00 ELA-PT-Marine Ani SHELBY CO. R-IV 
5/19/2014 2:00 ELA-PT-Marine Ani SHELBY CO. R-IV 
5/19/2014 2:00 ELA-PT-Uncommoi SHELBY CO. R-IV 
5/19/2014 2:00 Math-PT-Animal Jt SHELBY CO. R-IV 
5/19/2014 2:00 Math-PT-South Po SHELBY CO. R-IV 
5/19/2014 2:00 SBAC-GO4-ELA-Noi SHELBY CO. R-IV 
5/19/2014 2:00 SBAC-GO4-ELA-Noi SHELBY CO. R-IV 
5/19/2014 2:00 SBAC-GO4-Math-N SHELBY CO. R-IV 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SHELBY CO. R-IV 
5/19/2014 2:00 SBAC-GO8-Math-N SHELBY CO. R-IV 
5/19/2014 2:00 ELA-PT-Archeologi SHELDON R-VIII 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SHELDON R-VIII 
5/19/2014 2:00 Math-PT-Baseball- SHELL KNOB 78 
5/19/2014 2:00 SBAC-GO8-Math-N SHELL KNOB 78 
5/19/2014 2:00 SBAC-GO8-Math-N SHELL KNOB 78 
5/19/2014 2:00 SBAC-GO8-Math-N SHELL KNOB 78 
5/19/2014 2:00 SBAC-GO8-Math-N SHELL KNOB 78 
5/19/2014 2:00 SBAC-GO8-Math-N SHELL KNOB 78 
5/19/2014 2:00 ELA-PT-Space Expl SHERWOOD CASS 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SHERWOOD CASS 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SHERWOOD CASS 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SHERWOOD CASS 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SHERWOOD CASS 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SHERWOOD CASS 
5/19/2014 2:00 HS-ELA-PT-A New SIKESTON R-6 
5/19/2014 2:00 Math-PT-Making S SIKESTON R-6 
5/19/2014 2:00 Math-PT-Making S SIKESTON R-6 
5/19/2014 2:00 SBAC-GO3-Math-N SIKESTON R-6 
5/19/2014 2:00 SBAC-GO3-Math-N SIKESTON R-6 
5/19/2014 2:00 SBAC-GO3-Math-N SIKESTON R-6 
5/19/2014 2:00 SBAC-GO3-Math-N SIKESTON R-6 


PRP rPrPRPPrPRP PRP PRP RPP RPP PRP RP RP RP PRP RP RP RP RP RP RP RP PRP RRP RP RP RP RP RP RP RP RB 


NP PrP RPO 


100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
0.00% 
69.23% 
69.23% 
100.00% 
100.00% 
98.18% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
94.87% 
100.00% 
0.00% 
98.18% 
0.00% 
83.33% 
83.33% 
66.67% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
95.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
75.78% 
0.00% 
96.15% 
96.00% 
100.00% 
92.59% 
0.00% 


100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
0.00% 
0.00% 
69.23% 
69.23% 
100.00% 
100.00% 
98.18% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
87.18% 
100.00% 
0.00% 
98.18% 
0.00% 
83.33% 
83.33% 
66.67% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
95.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
75.78% 
0.00% 
96.15% 
96.00% 
100.00% 
92.59% 
0.00% 


5/19/2014 2:00 SBAC-HS-ELA-Nonl SIKESTON R-6 
5/19/2014 2:00 ELA-PT-Animals W SILEX R-I 
5/19/2014 2:00 ELA-PT-Marine Ani SILEX R-I 
5/19/2014 2:00 Math-PT-Donuts-A SILEX R-I 
5/19/2014 2:00 Math-PT-Making S SILEX R-I 
5/19/2014 2:00 Math-PT-Making S SILEX R-I 
5/19/2014 2:00 SBAC-GO3-ELA-No! SILEX R-I 
5/19/2014 2:00 SBAC-GO3-Math-N SILEX R-I 
5/19/2014 2:00 SBAC-GO3-Math-N SILEX R-I 
5/19/2014 2:00 SBAC-GO3-Math-N SILEX R-I 
5/19/2014 2:00 SBAC-GO5-ELA-Noi SILEX R-I 
5/19/2014 2:00 SBAC-GO7-Math-N SILEX R-I 
5/19/2014 2:00 SBAC-GO7-Math-N SILEX R-I 
5/19/2014 2:00 ELA-PT-Archeologi SKYLINE R-II 
5/19/2014 2:00 Math-PT-South Po SKYLINE R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! SKYLINE R-II 
5/19/2014 2:00 SBAC-GO8-Math-N SKYLINE R-II 
5/19/2014 2:00 ELA-PT-Archeologi SOUTH CALLAWAY 
5/19/2014 2:00 ELA-PT-Renewable SOUTH CALLAWAY 
5/19/2014 2:00 Math-PT-South Po SOUTH CALLAWAY 
5/19/2014 2:00 SBAC-GO8-ELA-No! SOUTH CALLAWAY 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SOUTH CALLAWAY 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SOUTH CALLAWAY 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SOUTH CALLAWAY 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SOUTH CALLAWAY 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SOUTH CALLAWAY 
5/19/2014 2:00 SBAC-GO8-Math-N SOUTH CALLAWAY 
5/19/2014 2:00 HS-Math-PT-Great SOUTH HARRISON 
5/19/2014 2:00 SBAC-HS-Math-No SOUTH HARRISON 
5/19/2014 2:00 ELA-PT-Uncommo! SOUTH PEMISCOT 
5/19/2014 2:00 ELA-PT-Uncommo! SOUTH PEMISCOT 
5/19/2014 2:00 Math-PT-Animal Jt SOUTH PEMISCOT 
5/19/2014 2:00 Math-PT-Animal Jt SOUTH PEMISCOT 
5/19/2014 2:00 SBAC-GO04-ELA-Noi SOUTH PEMISCOT 
5/19/2014 2:00 SBAC-GO4-ELA-Noi SOUTH PEMISCOT 
5/19/2014 2:00 SBAC-GO4-ELA-Noi SOUTH PEMISCOT 
5/19/2014 2:00 SBAC-GO4-Math-N SOUTH PEMISCOT 
5/19/2014 2:00 SBAC-GO4-Math-N SOUTH PEMISCOT 
5/19/2014 2:00 SBAC-GO04-Math-N SOUTH PEMISCOT 
5/19/2014 2:00 SBAC-GO4-Math-N SOUTH PEMISCOT 
5/19/2014 2:00 ELA-PT-Growth an SPECL. SCH. DST. S 
5/19/2014 2:00 ELA-PT-Renewable SPECL. SCH. DST. S 
5/19/2014 2:00 HS-ELA-PT-A New SPECL. SCH. DST. S$ 


PRP RPP PrP RP PPP PRP RP PRP RPP RPP PRP RP RPP RPP RRP PRP RP RP RP RP RP RRP RP RP RP RP RP PR 


256 


77.13% 
0.00% 
96.43% 
86.96% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
96.43% 
90.91% 
83.33% 
0.00% 
100.00% 
0.00% 
80.00% 
0.00% 
68.85% 
0.00% 
100.00% 
100.00% 
100.00% 
66.67% 
100.00% 
92.31% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
83.33% 
90.32% 
0.00% 
0.00% 
0.00% 
100.00% 
83.33% 
87.50% 
92.86% 
0.00% 
28.57% 
91.57% 


76.17% 
0.00% 
71.43% 
86.96% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
92.86% 
90.91% 
83.33% 
0.00% 
100.00% 
0.00% 
80.00% 
0.00% 
67.21% 
0.00% 
100.00% 
100.00% 
100.00% 
66.67% 
100.00% 
92.31% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
83.33% 
90.32% 
0.00% 
0.00% 
0.00% 
100.00% 
83.33% 
87.50% 
92.86% 
0.00% 
28.57% 
80.34% 


5/19/2014 2:00 HS-Math-PT-Great SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO6-ELA-Noi SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO6-ELA-No! SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO6-ELA-Noi SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO6-ELA-No! SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO8-ELA-No! SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO8-ELA-No! SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-GO8-ELA-Noi SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-HS-ELA-Nonl SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-HS-ELA-Nonl SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-HS-Math-No SPECL. SCH. DST. 
5/19/2014 2:00 SBAC-HS-Math-No SPECL. SCH. DST. 
5/19/2014 2:00 ELA-PT-The Americ SPRINGFIELD R-XII 
5/19/2014 2:00 ELA-PT-Uncommoi SPRINGFIELD R-XII 
5/19/2014 2:00 HS-ELA-PT-A New SPRINGFIELD R-XII 
5/19/2014 2:00 HS-ELA-PT-A New SPRINGFIELD R-XII 
5/19/2014 2:00 HS-Math-PT-Great SPRINGFIELD R-XII 
5/19/2014 2:00 Math-PT-Making S SPRINGFIELD R-XII 
5/19/2014 2:00 Math-PT-Order Fo SPRINGFIELD R-XI| 
5/19/2014 2:00 Math-PT-Turtle Ha SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO3-ELA-Noi SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO3-ELA-Noi SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO3-ELA-Noi SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO3-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO3-Math-N SPRINGFIELD R-XI| 
5/19/2014 2:00 SBAC-GO3-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO4-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO4-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO4-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO04-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO04-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO5-ELA-Noi SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO5-ELA-Noi SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO5-ELA-Noi SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO5-ELA-Noi SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO5-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO5-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO5-Math-N SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-GO5-Math-N SPRINGFIELD R-XI| 
5/19/2014 2:00 SBAC-HS-ELA-Nonl SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-HS-ELA-NonlI SPRINGFIELD R-XII 


NNnNANnNNNNNNNANANANAN NWN 


PRP PrP RPP PPP PRP RP RPP RP PRP RPP RP RP RP RP RP RP RRP PRP RP RP RPP PRP PRP RP RP RP RP PR 


NS 
pay 


NO WWWwWWwWRePFN DN N 


105 


MWwWNN N 
moo FF WwW 


369 
462 
368 


WWWrRrRPrR ON U 
PNW WO WwWOWwWODO HO WM) 


CON CON © ON CO N 


a 
nnn uM 


369 
267 


OrRrFrNrFOO CO OF AD 


QD WO 
nD WM 


29 


onn mn woowon wo NWN 


a 
Pus 


360 
150 


OrRrFrNrRrFOO CO OF DY 


27 


ounnmnn N on oOo Nw 


a 
Pus 


327 
132 


14.63% 
50.00% 
0.00% 
0.00% 
0.00% 
0.00% 
33.33% 
66.67% 
33.33% 
0.00% 
90.48% 
90.41% 
28.57% 
0.00% 
70.00% 
0.00% 
93.50% 
46.10% 
75.27% 
94.55% 
91.14% 
83.33% 
94.44% 
88.89% 
78.95% 
96.97% 
93.75% 
93.55% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
85.71% 
62.50% 
71.43% 
62.50% 
60.00% 
93.33% 
100.00% 
73.33% 
97.56% 
56.18% 


14.63% 
50.00% 
0.00% 
0.00% 
0.00% 
0.00% 
33.33% 
66.67% 
33.33% 
0.00% 
54.29% 
63.01% 
28.57% 
0.00% 
70.00% 
0.00% 
90.79% 
45.02% 
75.27% 
92.73% 
89.87% 
83.33% 
88.89% 
88.89% 
73.68% 
96.97% 
93.75% 
87.10% 
100.00% 
100.00% 
100.00% 
100.00% 
87.50% 
85.71% 
62.50% 
71.43% 
62.50% 
60.00% 
93.33% 
100.00% 
73.33% 
88.62% 
49.44% 


5/19/2014 2:00 SBAC-HS-ELA-Nonl SPRINGFIELD R-XII 
5/19/2014 2:00 SBAC-HS-Math-No SPRINGFIELD R-XII 
5/19/2014 2:00 Math-PT-Commun ST LOUIS LANG IM 
5/19/2014 2:00 SBAC-GO4-Math-N ST LOUIS LANG IM 
5/19/2014 2:00 SBAC-GO4-Math-N ST LOUIS LANG IM 
5/19/2014 2:00 SBAC-GO4-Math-N ST LOUIS LANG IM 
5/19/2014 2:00 SBAC-GO4-Math-N ST LOUIS LANG IM 
5/19/2014 2:00 SBAC-GO4-Math-N ST LOUIS LANG IM 
5/19/2014 2:00 ELA-PT-Animals W ST. CHARLES R-VI 
5/19/2014 2:00 ELA-PT-Technolog\ ST. CHARLES R-VI 
5/19/2014 2:00 Math-PT-Animal Jt ST. CHARLES R-VI 
5/19/2014 2:00 Math-PT-Making S ST. CHARLES R-VI 
5/19/2014 2:00 SBAC-GO3-ELA-Noi ST. CHARLES R-VI 
5/19/2014 2:00 SBAC-GO3-Math-N ST. CHARLES R-VI 
5/19/2014 2:00 SBAC-GO3-Math-N ST. CHARLES R-VI 
5/19/2014 2:00 SBAC-GO3-Math-N ST. CHARLES R-VI 
5/19/2014 2:00 SBAC-GO7-ELA-Noi ST. CHARLES R-VI 
5/19/2014 2:00 SBAC-GO7-ELA-No! ST. CHARLES R-VI 
5/19/2014 2:00 SBAC-GO7-ELA-No! ST. CHARLES R-VI 
5/19/2014 2:00 SBAC-GO7-ELA-No! ST. CHARLES R-VI 
5/19/2014 2:00 HS-ELA-PT-A New ST. JAMES R-I 
5/19/2014 2:00 Math-PT-Animal Jt ST. JAMES R-I 
5/19/2014 2:00 SBAC-GO3-Math-N ST. JAMES R-I 
5/19/2014 2:00 SBAC-GO3-Math-N ST. JAMES R-I 
5/19/2014 2:00 SBAC-GO3-Math-N ST. JAMES R-I 
5/19/2014 2:00 SBAC-HS-ELA-Nonl ST. JAMES R-I 
5/19/2014 2:00 SBAC-HS-ELA-Nonl ST. JAMES R-I 
5/19/2014 2:00 ELA-PT-Heatwaves ST. JOSEPH 
5/19/2014 2:00 ELA-PT-Importanc ST. JOSEPH 
5/19/2014 2:00 ELA-PT-Importanc: ST. JOSEPH 
5/19/2014 2:00 ELA-PT-The Ameri ST. JOSEPH 
5/19/2014 2:00 ELA-PT-Trees ST. JOSEPH 
5/19/2014 2:00 Math-PT-Cell Phor ST. JOSEPH 
5/19/2014 2:00 Math-PT-Science K ST. JOSEPH 
5/19/2014 2:00 SBAC-G03-ELA-Noi ST. JOSEPH 
5/19/2014 2:00 SBAC-G0O3-Math-N ST. JOSEPH 
5/19/2014 2:00 SBAC-GO6-ELA-Noi ST. JOSEPH 
5/19/2014 2:00 SBAC-GO6-ELA-Noi ST. JOSEPH 
5/19/2014 2:00 SBAC-GO6-ELA-Noi ST. JOSEPH 
5/19/2014 2:00 SBAC-GO6-ELA-Noi ST. JOSEPH 
5/19/2014 2:00 SBAC-GO7-ELA-Noi ST. JOSEPH 
5/19/2014 2:00 SBAC-GO7-ELA-Noi ST. JOSEPH 
5/19/2014 2:00 SBAC-GO7-ELA-Noi ST. JOSEPH 


PRP PPP RP PrP P PRP RP PRP RPP HPBP PRP PRP RP RP RP RP BRP RPP RP RP RP RP RP BRP RP RP RPP PR 


116 
123 


102 
285 


oO OO 0 O 


98 
280 


oO OO 0 O 


52.31% 
77.45% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
95.19% 
3.21% 
100.00% 
84.62% 
96.15% 
87.10% 
87.50% 
96.77% 
5.32% 
5.38% 
1.06% 
1.08% 
81.90% 
85.37% 
85.00% 
97.62% 
82.93% 
77.14% 
86.96% 
97.44% 
0.00% 
12.88% 
88.06% 
91.04% 
82.51% 
97.83% 
92.45% 
97.83% 
100.00% 
100.00% 
100.00% 
100.00% 
6.90% 
17.24% 
0.00% 


50.26% 
76.09% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
95.19% 
2.94% 
100.00% 
84.62% 
96.15% 
80.65% 
78.13% 
87.10% 
3.19% 
3.23% 
1.06% 
0.00% 
81.03% 
85.37% 
85.00% 
97.62% 
78.05% 
74.29% 
82.61% 
94.87% 
0.00% 
12.88% 
86.57% 
91.04% 
82.51% 
93.48% 
88.68% 
95.65% 
88.24% 
100.00% 
94.12% 
93.75% 
6.90% 
17.24% 
0.00% 


5/19/2014 2:00 SBAC-GO7-ELA-Noi ST 
5/19/2014 2:00 SBAC-GO7-ELA-Noi ST 
5/19/2014 2:00 SBAC-GO7-Math-N ST 


5/19/2014 2:00 SBAC-GO7-Math-N ST. 
5/19/2014 2:00 SBAC-GO7-Math-N ST. 


. JOSEPH 
. JOSEPH 
. JOSEPH 
JOSEPH 
JOSEPH 
JOSEPH 


5/19/2014 2:00 SBAC-GO7-Math-N ST. 
5/19/2014 2:00 ELA-PT-Marine Ani ST. 
5/19/2014 2:00 ELA-PT-Marine Ani ST. 
5/19/2014 2:00 SBAC-GO5-ELA-Noi ST. 
5/19/2014 2:00 SBAC-GO5-ELA-Noi ST. 
5/19/2014 2:00 SBAC-GO5-ELA-Noi ST. 
5/19/2014 2:00 SBAC-GO5-ELA-Noi ST. 
5/19/2014 2:00 SBAC-GO5-ELA-Noi ST. 
5/19/2014 2:00 HS-ELA-PT-A New ST. 
5/19/2014 2:00 SBAC-HS-ELA-Nonl ST. 


5/19/2014 2:00 ELA-PT-Growth an STE. 
5/19/2014 2:00 HS-ELA-PT-A New STE. 
5/19/2014 2:00 HS-ELA-PT-A New STE. 
5/19/2014 2:00 HS-Math-PT-Great STE. 
5/19/2014 2:00 Math-PT-Sandbox STE. 
5/19/2014 2:00 Math-PT-Sandbox- STE. 
5/19/2014 2:00 Math-PT-Talent Sh STE. 
5/19/2014 2:00 SBAC-GO6-ELA-Noi STE. 
5/19/2014 2:00 SBAC-GO6-Math-N STE. 
5/19/2014 2:00 SBAC-GO6-Math-N STE. 
5/19/2014 2:00 SBAC-GO6-Math-N STE. 
5/19/2014 2:00 SBAC-GO6-Math-N STE. 
5/19/2014 2:00 SBAC-GO6-Math-N STE. 
5/19/2014 2:00 SBAC-HS-ELA-Nonl STE. 
5/19/2014 2:00 SBAC-HS-ELA-Nonl STE. 
5/19/2014 2:00 SBAC-HS-ELA-Nonl STE. 
5/19/2014 2:00 SBAC-HS-ELA-Nonl STE. 
5/19/2014 2:00 SBAC-HS-Math-No STE. 


LOUIS CHARTE! 
LOUIS CHARTE! 
LOUIS CHARTE! 
LOUIS CHARTE! 
LOUIS CHARTE! 
LOUIS CHARTE! 
LOUIS CHARTE! 
LOUIS CITY 

LOUIS CITY 


GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE C¢ 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 
GENEVIEVE Ce 


5/19/2014 2:00 ELA-PT-Archeologi STEELVILLE R-II| 
5/19/2014 2:00 Math-PT-South Po STEELVILLE R-III 
5/19/2014 2:00 SBAC-GO8-ELA-No! STEELVILLE R-II 
5/19/2014 2:00 SBAC-GO8-Math-N STEELVILLE R-III 
5/19/2014 2:00 ELA-PT-Technolog\ STOCKTON R-I 
5/19/2014 2:00 Math-PT-Donuts STOCKTON R-I 
5/19/2014 2:00 Math-PT-Donuts-A STOCKTON R-I 
5/19/2014 2:00 SBAC-GO7-ELA-No! STOCKTON R-I 
5/19/2014 2:00 SBAC-GO7-Math-N STOCKTON R-I 
5/19/2014 2:00 SBAC-GO7-Math-N STOCKTON R-I 


PRPRPrP PRP rRPRP PRP RPP PRP RPP RPP RP PRP PRP RP PRP RP RP RP RPP RP RRP RP RP RP RP RP RP RPP BR 


10.34% 
16.95% 
100.00% 
100.00% 
77.18% 
66.25% 
97.96% 
97.96% 
100.00% 
100.00% 
97.96% 
100.00% 
100.00% 
85.45% 
70.00% 
0.00% 
95.37% 
0.00% 
100.00% 
0.00% 
94.49% 
0.00% 
0.00% 
100.00% 
100.00% 
0.00% 
86.21% 
88.89% 
0.00% 
99.07% 
0.00% 
0.00% 
0.00% 
88.41% 
0.00% 
84.06% 
0.00% 
0.00% 
100.00% 
98.82% 
0.00% 
100.00% 
100.00% 


10.34% 
16.95% 
100.00% 
100.00% 
77.18% 
66.25% 
93.88% 
97.96% 
100.00% 
100.00% 
97.96% 
100.00% 
100.00% 
84.55% 
61.82% 
0.00% 
94.44% 
0.00% 
100.00% 
0.00% 
94.49% 
0.00% 
0.00% 
100.00% 
100.00% 
0.00% 
86.21% 
88.89% 
0.00% 
98.13% 
0.00% 
0.00% 
0.00% 
88.41% 
0.00% 
84.06% 
0.00% 
0.00% 
100.00% 
98.82% 
0.00% 
100.00% 
100.00% 


5/19/2014 2:00 SBAC-GO7-Math-N STOCKTON R-I 
5/19/2014 2:00 SBAC-GO7-Math-N STOCKTON R-I 
5/19/2014 2:00 SBAC-GO7-Math-N STOCKTON R-I 
5/19/2014 2:00 HS-Math-PT-Great STOUTLAND R-II 
5/19/2014 2:00 SBAC-HS-Math-No STOUTLAND R-II 
5/19/2014 2:00 ELA-PT-Uncommoi STRASBURG C-3 
5/19/2014 2:00 SBAC-GO3-ELA-Noi STRASBURG C-3 
5/19/2014 2:00 SBAC-GO3-ELA-Noi STRASBURG C-3 
5/19/2014 2:00 SBAC-GO3-ELA-Noi STRASBURG C-3 
5/19/2014 2:00 ELA-PT-Archeologi STURGEON R-V 
5/19/2014 2:00 HS-Math-PT-Great STURGEON R-V 
5/19/2014 2:00 SBAC-GO8-ELA-Noi STURGEON R-V 
5/19/2014 2:00 SBAC-HS-Math-No STURGEON R-V 
5/19/2014 2:00 ELA-PT-Growth an SULLIVAN 
5/19/2014 2:00 Math-PT-Talent Sh SULLIVAN 
5/19/2014 2:00 Math-PT-Talent Sh SULLIVAN 
5/19/2014 2:00 SBAC-GO6-ELA-No! SULLIVAN 
5/19/2014 2:00 SBAC-GO6-Math-N SULLIVAN 
5/19/2014 2:00 SBAC-GO6-Math-N SULLIVAN 
5/19/2014 2:00 HS-Math-PT-Great SUMMERSVILLE R- 
5/19/2014 2:00 SBAC-HS-Math-No SUMMERSVILLE R- 
5/19/2014 2:00 ELA-PT-Technolog TINA-AVALON R-II 
5/19/2014 2:00 Math-PT-Donuts TINA-AVALON R-II 
5/19/2014 2:00 SBAC-GO7-ELA-No! TINA-AVALON R-II 
5/19/2014 2:00 SBAC-GO7-Math-N TINA-AVALON R-II 
5/19/2014 2:00 HS-ELA-PT-A New TIPTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-Noi TIPTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-Noi TIPTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-Noi TIPTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-Noi TIPTON R-VI 
5/19/2014 2:00 SBAC-GO8-ELA-Noi TIPTON R-VI 
5/19/2014 2:00 HS-ELA-PT-A New TRENTON R-IX 
5/19/2014 2:00 SBAC-HS-ELA-NonI TRENTON R-IX 
5/19/2014 2:00 SBAC-HS-ELA-NonI TRENTON R-IX 
5/19/2014 2:00 ELA-PT-Marine Ani TRI-COUNTY R-VII 
5/19/2014 2:00 ELA-PT-The Americ TRI-COUNTY R-VII 
5/19/2014 2:00 ELA-PT-The Americ TRI-COUNTY R-VII 
5/19/2014 2:00 Math-PT-Turtle Ha TRI-COUNTY R-VII 
5/19/2014 2:00 SBAC-GO5-ELA-No! TRI-COUNTY R-VII 
5/19/2014 2:00 SBAC-GO5-ELA-Noi TRI-COUNTY R-VII 
5/19/2014 2:00 SBAC-GO5-ELA-No! TRI-COUNTY R-VII 
5/19/2014 2:00 SBAC-GO5-ELA-No! TRI-COUNTY R-VII 
5/19/2014 2:00 SBAC-GO5-ELA-Noi TRI-COUNTY R-VII 


PPP P PPP RPP PPP PRP RPP RP RPP RPP RPP PRP RP RP PRP RPP RPP RP PRP RPP PRP PP PR 


© CO WO C&C OO 


No Ul N 
NO N 


PrPFUrRPFNN WW N 


25 


RPrRFNFRFN OO WW © 


RPrRFNrFRFN OO WW © 


100.00% 
95.45% 
100.00% 
96.77% 
96.77% 
88.24% 
100.00% 
100.00% 
80.00% 
100.00% 
86.11% 
100.00% 
88.89% 
0.00% 
90.36% 
0.00% 
0.00% 
91.57% 
0.00% 
70.37% 
70.37% 
100.00% 
0.00% 
100.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
100.00% 
81.82% 
94.00% 
92.59% 
0.00% 
100.00% 
100.00% 
0.00% 
100.00% 
100.00% 
40.00% 
100.00% 
100.00% 


100.00% 
95.45% 
100.00% 
96.77% 
96.77% 
88.24% 
100.00% 
100.00% 
80.00% 
100.00% 
86.11% 
71.43% 
88.89% 
0.00% 
84.94% 
0.00% 
0.00% 
89.16% 
0.00% 
70.37% 
66.67% 
100.00% 
0.00% 
100.00% 
0.00% 
92.86% 
50.00% 
37.50% 
77.18% 
25.00% 
66.67% 
81.82% 
64.00% 
92.59% 
0.00% 
100.00% 
100.00% 
0.00% 
100.00% 
100.00% 
40.00% 
100.00% 
100.00% 


5/19/2014 2:00 SBAC-GO5-Math-N TRI-COUNTY R-VII 


5/19/2014 2:00 Math-PT-Commun TROY R-II 
5/19/2014 2:00 Math-PT-Making S TROY R-III 
5/19/2014 2:00 SBAC-G03-Math-N TROY R-III 
5/19/2014 2:00 SBAC-G03-Math-N TROY R-III 
5/19/2014 2:00 SBAC-G03-Math-N TROY R-III 
5/19/2014 2:00 SBAC-G04-Math-N TROY R-III 
5/19/2014 2:00 SBAC-G04-Math-N TROY R-III 
5/19/2014 2:00 SBAC-G04-Math-N TROY R-III 
5/19/2014 2:00 SBAC-G04-Math-N TROY R-III 
5/19/2014 2:00 SBAC-G04-Math-N TROY R-III 
5/19/2014 2:00 HS-ELA-PT-A New TWIN RIVERS R-X 
5/19/2014 2:00 HS-ELA-PT-A New TWIN RIVERS R-X 
5/19/2014 2:00 SBAC-HS-ELA-Non! TWIN RIVERS R-X 
5/19/2014 2:00 SBAC-HS-ELA-NonI TWIN RIVERS R-X 
5/19/2014 2:00 SBAC-HS-ELA-Non! TWIN RIVERS R-X 
5/19/2014 2:00 Math-PT-Donuts-A UNION R-XI 
5/19/2014 2:00 SBAC-GO7-Math-N UNION R-XI 
5/19/2014 2:00 SBAC-GO7-Math-N UNION R-XI 
5/19/2014 2:00 ELA-PT-Renewable UNION STAR R-II 
5/19/2014 2:00 Math-PT-Sandbox- UNION STAR R-II 
5/19/2014 2:00 SBAC-GO5-Math-N UNION STAR R-II 
5/19/2014 2:00 SBAC-GO5-Math-N UNION STAR R-II 
5/19/2014 2:00 SBAC-GO5-Math-N UNION STAR R-II 
5/19/2014 2:00 SBAC-GO5-Math-N UNION STAR R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! UNION STAR R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! UNION STAR R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! UNION STAR R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! UNION STAR R-II 
5/19/2014 2:00 SBAC-GO8-ELA-No! UNION STAR R-II 
5/19/2014 2:00 Math-PT-Animal Jt VALLEY PARK 
5/19/2014 2:00 SBAC-G04-Math-N VALLEY PARK 
5/19/2014 2:00 SBAC-G04-Math-N VALLEY PARK 
5/19/2014 2:00 SBAC-G04-Math-N VALLEY PARK 
5/19/2014 2:00 SBAC-G04-Math-N VALLEY PARK 
5/19/2014 2:00 SBAC-G04-Math-N VALLEY PARK 
5/19/2014 2:00 HS-Math-PT-Great VALLEY R-VI 
5/19/2014 2:00 SBAC-HS-Math-No VALLEY R-VI 
5/19/2014 2:00 Math-PT-Camping VAN BUREN R-I 
5/19/2014 2:00 Math-PT-Camping: VAN BUREN R-I 
5/19/2014 2:00 SBAC-GO8-Math-N VAN BUREN R-I 
5/19/2014 2:00 SBAC-GO8-Math-N VAN BUREN R-I 
5/19/2014 2:00 SBAC-GO8-Math-N VAN BUREN R-I 


PPP PrP RP PPP P PRP RP RPP RP PRP RPP RP RP RPP RP RPP RPP RP RP PRP RPP RPP RPP RP RP RP PR 


11 


WNNNN WWW NN 


bE  F OO 
CO CON WO 


18 


1 


oO 


OOO DOO OWRrNNN NN WW N 


27 


1 


oO 


OOO OO OWRrNFP FPN WW N 


27 


0.00% 
95.59% 
94.74% 
100.00% 
100.00% 
85.71% 
92.31% 
92.86% 
84.62% 
92.86% 
100.00% 
100.00% 
0.00% 
100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
81.82% 
90.91% 
100.00% 
100.00% 
100.00% 

66.67% 
100.00% 
100.00% 
100.00% 

50.00% 
100.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 
84.38% 
84.38% 

100.00% 
86.49% 
100.00% 
93.75% 
71.43% 


0.00% 
95.59% 
94.74% 

100.00% 
100.00% 
85.71% 
84.62% 
92.86% 
84.62% 
92.86% 
100.00% 
100.00% 
0.00% 
100.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 
72.73% 
90.91% 

100.00% 
100.00% 
100.00% 
66.67% 
50.00% 
50.00% 
100.00% 
50.00% 
100.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 

0.00% 
84.38% 
84.38% 

100.00% 
86.49% 
100.00% 
93.75% 
71.43% 


5/19/2014 2:00 SBAC-GO8-Math-N VAN BUREN R-I 


5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 
5/19/2014 2:00 


ELA-PT-Animals W WARREN CO. 
ELA-PT-Archeologi WARREN CO. 
ELA-PT-Archeologi WARREN CO. 
ELA-PT-Marine Ani WARREN CO. 
ELA-PT-Marine Ani WARREN CO. 
ELA-PT-Technolog’ WARREN CO. 
ELA-PT-Uncommo: WARREN CO. 
ELA-PT-Uncommo: WARREN CO. 
Math-PT-Animal Jt WARREN CO. 
Math-PT-Animal Jt WARREN CO. 
Math-PT-Cell Phor WARREN CO. 
Math-PT-Commun WARREN CO. 
Math-PT-Commun WARREN CO. 
Math-PT-Donuts WARREN CO. 
Math-PT-Donuts-A WARREN CO. 
Math-PT-Making S WARREN CO. 
Math-PT-Making S WARREN CO. 
Math-PT-Sandbox WARREN CO. 


5/19/2014 2:00 Math-PT-Sandbox- WARREN CO. 
5/19/2014 2:00 Math-PT-South Po WARREN CO. 
5/19/2014 2:00 Math-PT-Turtle Ha WARREN CO. 
5/19/2014 2:00 SBAC-GO3-ELA-No: WARREN CO. 
5/19/2014 2:00 SBAC-GO3-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO3-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO3-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO3-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO4-ELA-Noi WARREN CO. 
5/19/2014 2:00 SBAC-GO4-ELA-No: WARREN CO. 
5/19/2014 2:00 SBAC-GO4-ELA-Noi WARREN CO. 
5/19/2014 2:00 SBAC-GO4-ELA-No: WARREN CO. 
5/19/2014 2:00 SBAC-GO4-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO4-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO4-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO4-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No! WARREN CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No! WARREN CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No! WARREN CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No! WARREN CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No: WARREN CO. 
5/19/2014 2:00 SBAC-GO5-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO5-Math-N WARREN CO. 
5/19/2014 2:00 SBAC-GO5-Math-N WARREN CO. 


R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-IIl 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-IIl 
R-Il| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II| 
R-II 
R-IIl 
R-Ill 
R-II| 
R-II| 
R-II| 
R-II| 
R-IIl 
R-II| 
R-IIl 
R-II| 


PPP rPrPrPrP PRP RP PRP RP PRP PRP PRP RP PRP RPP PRP PRP RP RP RP RP RP PRP RP RP RP RP RP RP PRB 


OoOoOOoOooaooouoelOmlUmCODCUCUOlCUCOUUWN 


20 


ooooooo°ouoclutDtClKWdlKOUrlUlUCOlCUlcUOlULr Ee 


41 


OOOoOOoooaoelUMUulmlUlcODUCUCUOUCUCUOUWwN 


20 


oOoooooooocltDtClKWdlOUlUCUOUCUlcUOlULr 


41 


100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 

16.92% 
100.00% 
85.86% 
0.00% 
0.00% 
0.96% 
53.28% 
60.00% 
86.21% 
0.00% 
0.00% 
0.00% 
95.83% 
53.66% 
35.09% 
0.96% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
62.30% 
61.19% 
75.00% 


100.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 

16.92% 
100.00% 
85.86% 
0.00% 
0.00% 
0.96% 
53.28% 
60.00% 
86.21% 
0.00% 
0.00% 
0.00% 
95.83% 
53.66% 
35.09% 
0.96% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
62.30% 
61.19% 
75.00% 


5/19/2014 2:00 SBAC-GO5-Math-N WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO5-Math-N WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO6-Math-N WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO6-Math-N WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO7-ELA-Noi WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO7-Math-N WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO7-Math-N WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO7-Math-N WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WARREN CO. R-III 
5/19/2014 2:00 SBAC-GO8-Math-N WARREN CO. R-III 
5/19/2014 2:00 ELA-PT-Growth an WARRENSBURG R- 
5/19/2014 2:00 ELA-PT-Technolog' WARRENSBURG R- 
5/19/2014 2:00 ELA-PT-Uncommo! WARRENSBURG R- 
5/19/2014 2:00 Math-PT-Sandbox WARRENSBURG R- 
5/19/2014 2:00 Math-PT-Sandbox- WARRENSBURG R- 
5/19/2014 2:00 Math-PT-Talent Sh WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO3-ELA-No! WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO3-ELA-Noi WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO3-ELA-Noi WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO5-Math-N WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO5-Math-N WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO5-Math-N WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO5-Math-N WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO5-Math-N WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO6-ELA-Noi WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO6-ELA-Noi WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO6-ELA-Noi WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO6-ELA-No! WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO6-ELA-Noi WARRENSBURG R- 
5/19/2014 2:00 SBAC-GO6-Math-N WARRENSBURG R- 
5/19/2014 2:00 ELA-PT-Deserts | WAYNESVILLE R-V! 
5/19/2014 2:00 ELA-PT-Deserts-A WAYNESVILLE R-V! 
5/19/2014 2:00 ELA-PT-Marine Ani WAYNESVILLE R-V! 
5/19/2014 2:00 ELA-PT-Marine Ani WAYNESVILLE R-V! 
5/19/2014 2:00 ELA-PT-Space Expl WAYNESVILLE R-V! 
5/19/2014 2:00 ELA-PT-Space Expl WAYNESVILLE R-V! 
5/19/2014 2:00 HS-Math-PT-Great WAYNESVILLE R-V! 
5/19/2014 2:00 Math-PT-Turtle Ha WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO04-ELA-Noi WAYNESVILLE R-V! 


PRP PPP PPP RP PRP RP PRP RPP RPP PRP RP RP RP RP RP RRP PRP RP RP RP RP RP RRP PRP RP RP RP PR 


NF BA 
CON CO O 


PFPOoOoOoooooo oO oO Oo 


NF BA 
COON CO O 


PFPOoOoOooooeoeoo oO oO Oo 


43.48% 
83.33% 
12.69% 
21.21% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
25.00% 
99.12% 
100.00% 
50.00% 
62.41% 
0.00% 
100.00% 
100.00% 
100.00% 
84.85% 
75.76% 
50.00% 
76.47% 
81.82% 
98.21% 
100.00% 
100.00% 
100.00% 
100.00% 
0.00% 
100.00% 
83.62% 
0.00% 
0.00% 
0.00% 
83.87% 
85.09% 
92.86% 
86.21% 


43.48% 
83.33% 
12.69% 
21.21% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
25.00% 
98.67% 
100.00% 
50.00% 
57.14% 
0.00% 
100.00% 
100.00% 
100.00% 
78.79% 
51.52% 
50.00% 
50.00% 
72.73% 
94.64% 
94.74% 
50.00% 
91.07% 
94.74% 
0.00% 
50.00% 
78.75% 
0.00% 
0.00% 
0.00% 
83.44% 
84.17% 
92.86% 
82.76% 


5/19/2014 2:00 SBAC-GO4-ELA-Noi WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-G04-ELA-Noi WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO4-ELA-No! WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-G04-ELA-Noi WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO04-ELA-Noi WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-ELA-Noi WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-ELA-No! WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-ELA-No! WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-ELA-No! WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-ELA-No! WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-Math-N WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-Math-N WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-Math-N WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO5-Math-N WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO8-ELA-No! WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO8-ELA-No! WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO8-ELA-No! WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WAYNESVILLE R-V! 
5/19/2014 2:00 SBAC-HS-Math-No WAYNESVILLE R-V! 
5/19/2014 2:00 Math-PT-Animal Jt WEBB CITY R-VII 
5/19/2014 2:00 SBAC-GO3-Math-N WEBB CITY R-VII 
5/19/2014 2:00 SBAC-GO3-Math-N WEBB CITY R-VII 
5/19/2014 2:00 SBAC-GO3-Math-N WEBB CITY R-VII 
5/19/2014 2:00 HS-ELA-PT-A New WELLSVILLE MIDD 
5/19/2014 2:00 Math-PT-Cell Phor WELLSVILLE MIDD 
5/19/2014 2:00 SBAC-GO6-Math-N WELLSVILLE MIDD 
5/19/2014 2:00 SBAC-GO6-Math-N WELLSVILLE MIDD 
5/19/2014 2:00 SBAC-GO6-Math-N WELLSVILLE MIDD 
5/19/2014 2:00 SBAC-GO6-Math-N WELLSVILLE MIDD 
5/19/2014 2:00 SBAC-HS-ELA-Non! WELLSVILLE MIDD 
5/19/2014 2:00 Math-PT-Talent Sh WENTZVILLE R-IV 
5/19/2014 2:00 SBAC-GO5-Math-N WENTZVILLE R-IV 
5/19/2014 2:00 SBAC-GO5-Math-N WENTZVILLE R-IV 
5/19/2014 2:00 SBAC-GO5-Math-N WENTZVILLE R-IV 
5/19/2014 2:00 SBAC-GO05-Math-N WENTZVILLE R-IV 
5/19/2014 2:00 ELA-PT-The Americ WEST PLATTE CO. 
5/19/2014 2:00 HS-Math-PT-Great WEST PLATTE CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No: WEST PLATTE CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No! WEST PLATTE CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No: WEST PLATTE CO. 
5/19/2014 2:00 SBAC-GO5-ELA-No: WEST PLATTE CO. 


PRPRPPRPRP RPP PRP PPP PPP PRP PRP PRP PRP RP PRP PRP RP RP RP PRP PP PP PPP PR 


ee ee ee eee) 


ion 


pay 
£ A 


Oe ee ee eee) 


91.23% 
100.00% 
81.03% 
92.98% 
84.21% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
80.65% 
87.10% 
0.00% 
86.02% 
87.10% 
82.80% 
88.07% 
0.00% 
91.38% 
95.08% 
93.22% 
95.65% 
94.74% 
100.00% 
100.00% 
100.00% 
80.00% 
95.65% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 


89.47% 
100.00% 
77.59% 
89.47% 
84.21% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
100.00% 
100.00% 
100.00% 
100.00% 
68.82% 
75.27% 
0.00% 
74.19% 
75.27% 
69.89% 
82.57% 
0.00% 
72.41% 
67.21% 
69.49% 
95.65% 
94.74% 
100.00% 
100.00% 
100.00% 
80.00% 
60.87% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
0.00% 


5/19/2014 2:00 SBAC-HS-Math-No WEST PLATTE CO. 
5/19/2014 2:00 Math-PT-Donuts WEST ST. FRANCO 
5/19/2014 2:00 Math-PT-Donuts-A WEST ST. FRANCO 
5/19/2014 2:00 SBAC-GO7-Math-N WEST ST. FRANCO 
5/19/2014 2:00 SBAC-GO7-Math-N WEST ST. FRANCO 
5/19/2014 2:00 HS-Math-PT-Great WHEATON R-III 
5/19/2014 2:00 SBAC-HS-Math-No WHEATON R-III 
5/19/2014 2:00 ELA-PT-Uncommo! WILLARD R-II 
5/19/2014 2:00 ELA-PT-Uncommo! WILLARD R-II 
5/19/2014 2:00 Math-PT-Turtle Ha WILLARD R-II 
5/19/2014 2:00 SBAC-GO3-ELA-No! WILLARD R-II 
5/19/2014 2:00 SBAC-GO3-ELA-No! WILLARD R-II 
5/19/2014 2:00 SBAC-GO3-ELA-Noi WILLARD R-II 
5/19/2014 2:00 SBAC-GO3-ELA-No! WILLARD R-II 
5/19/2014 2:00 SBAC-GO4-Math-N WILLARD R-II 
5/19/2014 2:00 SBAC-GO04-Math-N WILLARD R-II 
5/19/2014 2:00 SBAC-GO4-Math-N WILLARD R-II 
5/19/2014 2:00 SBAC-GO4-Math-N WILLARD R-II 
5/19/2014 2:00 SBAC-GO4-Math-N WILLARD R-II 
5/19/2014 2:00 HS-Math-PT-Great WINONA R-III 
5/19/2014 2:00 SBAC-HS-Math-No WINONA R-III 
5/19/2014 2:00 ELA-PT-Renewable WORTH CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WORTH CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WORTH CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WORTH CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WORTH CO. R-III 
5/19/2014 2:00 SBAC-GO8-ELA-Noi WORTH CO. R-III 


PRP PPP RP RPRPRP PRP PRP RP RPP PPP RPP RP RP RP RB 


NAN OD N 


0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
62.50% 
62.50% 
50.00% 
96.30% 
90.59% 
94.12% 
94.74% 
100.00% 
50.00% 
94.12% 
82.35% 
94.12% 
94.12% 
100.00% 
0.00% 
0.00% 
96.88% 
100.00% 
100.00% 
85.71% 
83.33% 
100.00% 


0.00% 
0.00% 
0.00% 
0.00% 
0.00% 
62.50% 
62.50% 
50.00% 
96.30% 
90.59% 
94.12% 
94.74% 
100.00% 
50.00% 
94.12% 
82.35% 
94.12% 
94.12% 
100.00% 
0.00% 
0.00% 
96.88% 
85.71% 
100.00% 
85.71% 
83.33% 
100.00% 


Appendix P-—  Grade-Level Assessment Blueprints 


Page 34 of 39 


Blueprint for ENGLISH LANGUAGE ARTS Grades 3-4 


Claim Category Point Range Range Of Emphasis 


____Speaking/Listening | Listening 20% 


Blueprint for ENGLISH LANGUAGE ARTS Grade 5 
Claim Category Point Range Range Of Emphasis 


_ Writing Conventions 12% 
____Speaking/listening | Listening J 18% 
Research YT Research 16% 


Blueprint for ENGLISH LANGUAGE ARTS Grades 6-7 
Claim Category Point Range Range Of Emphasis 


Lo Reading | iteracy 20% 


____Speaking/listening | Listening 20% 


Blueprint for ENGLISH LANGUAGE ARTS Grade 8 


Claim Category Point Range Range Of Emphasis 


ic | ey | 


Lo Writing Conventions 12% 
____Speaking/Listening | Listening 18% 
Research | Research 18% 


Blueprint for MATHEMATICS Grades 3-4 
Claim Category Point Range Range Of Emphasis 


[Communicating Reasoning | CommunicatingReasoning J} 6 19% 


Blueprint for MATHEMATICS Grade 5 
Claim Category Point Range Range Of Emphasis 


__Problem Solving | ProblemSolving J 0-9 029% 
/Communicating Reasoning [| CommunicatingReasoning J} 826% 
__Modeling And Data Analysis_| Modeling And Data Analysis | 0-9] 029% 


Blueprint for MATHEMATICS Grades 6-7 
Claim Category Point Range Range Of Emphasis 


Concepts And Procedures Priority Cluster 14-15 45-48% 
Concepts And Procedures Supporting Cluster 
Problem Solving Problem Solving 0-16% 


Communicating Reasoning Communicating Reasoning 6 fF 19% 
Modeling And Data Analysis Modeling And Data Analysis 0-16% 
30-31 100% 


Blueprint for MATHEMATICS Grade 8 


Claim Category Point Range Range Of Emphasis 


Concepts And Procedures Priority Cluster 14-15 38-41% 
Concepts And Procedures Supporting Cluster 
Problem Solving Problem Solving Tots 0-24% 


Communicating Reasoning Communicating Reasoning SE ae 
Modeling And Data Analysis Modeling And Data Analysis aoe 0-24% 
36-37 100% 


Blueprint for SCIENCE Grade 5 
Content Strand Point Range Range Of Emphasis 


1. ME: Properties And Principles Of Matter And Energy 12-13% 
2. FM: Properties And Principles Of Force And Motion 5-6 8-10% 
3. LO: Characteristics and Interactions of Living Organisms 5-6 8-10% 


4. EC: Changes in Ecosystems and Interactions of Organisms with 5-7 8-12% 
their Environments 


5. ES: Processes And Interactions Of The Earth’s Systems 12-17% 


6. UN: Composition And Structure Of The Universe And The Motions 6-7 10-12% 
Of The Objects Within It 


7. IN: Processes Of Scientific Inquiry 14-17 23-28% 
8. ST: Impact Of Science, Technology, And Human Activity 5-6 8-10% 
ota 100% 


Blueprint for SCIENCE Grade 8 
Content Strand Point Range Range Of Emphasis 


1. ME: Properties And Principles Of Matter And Energy 7-8 12-13 
2. FM: Properties And Principles Of Force And Motion 8-10% 
3. LO: Characteristics and Interactions of Living Organisms 8-10% 


4. EC: Changes in Ecosystems and Interactions of Organisms with 5-7 8-12% 
their Environments 


5. ES: Processes And Interactions Of The Earth’s Systems 12-17% 


6. UN: Composition And Structure Of The Universe And The Motions 6-7 10-12% 
Of The Objects Within It 


7. IN: Processes Of Scientific Inquiry 14-17 23-28% 
8. ST: Impact Of Science, Technology, And Human Activity 5-6 8-10% 
Total 100% 


Appendix Q— 2014 MAP-A Field Test School Districts and Charter Schools 


Page 35 of 39 


District 

Academie Lafayette 
Advance R-IV 
Affton 101 
Arcadia Valley R-II 
Aurora R-VIII 

AVA R-I 

Bayless 

Belton 124 

Bevier C-4 
Bismarck R-V 

Blue Eye R-V 

Blue Springs R-IV 
Boonville R-| 
BOWLING GREEN R-I 
Branson R-IV 
Braymer C-4 
Buchanan Co. R-IV 
Bunker R-II 

Callao C-8 
Camdenton R-III 
Campbell R-II 

Carl Junction R-I 
Carrollton R-VII 
Carthage R-IX 
Center 58 

Chaffee R-II 
Chilhowee R-IV 
Chillicothe R-II 
Clark Co. R-I 
Clayton 
Clearwater R-| 
Clinton 

Clinton Co. R-III 
COLE CAMP R-I 
Columbia 93 
Concordia R-II 
Confluence Academies 
Crawford Co. R-I 
Dallas Co. R-I 
Dent-Phelps R-II| 
Dexter R-Xi 

Dixon R-1 

Dunklin R-V 

East Newton Co. R-VI 
East Prairie R-2 
Elsberry R-II 


Ewing Marion Kauffman School 
FAIRVIEW R-XI 
Farmington R-VII 
Ferguson-Florissant R-I| 
Forsyth R-II| 

Fox C-6 

Francis Howell R-III 

Ft. Zumwalt R-II 
Gainesville R-V 

Galena R-II 

Gallatin R-V 
Gasconade Co. R-I 
Gasconade Co. R-II 
Gilman City R-IV 
Glasgow 

Grain Valley R-V 
Greenfield R-IV 


Greenville R-II 
Hale R-I 
Hamilton R-II 


Hancock Place 
Harrisonville R-IX 
Hazelwood 
Henry Co. R-I 
Hickory Co. R-I 
Independence 30 
Jasper Co. R-V 
Jennings 
Johnson Co. R-VII 
Joplin Schools 
Kearney R-I 
Kennett 39 
Kingston K-14 
KIRKSVILLE R-II 
Kirkwood R-VII 
Knob Noster R-VIII 
Knox Co. R-I 

La Monte R-IV 

La Plata R-II 
Laclede Co. R-I 
Ladue 

Lafayette Co. C-1 
LAMAR R-I 
Lawson R-Xiv 
Lebanon R-II 
Lee's Summit R-VII 
Lewis Co. C-1 


Licking R-VIII 

LINCOLN R-II 

Lindbergh Schools 
Logan-Rogersville R-VIII 
Lone Jack C-6 

Lonedell R-Xiv 

Louisiana R-II 

Macon Co. R-I 

Madison C-3 

Malden R-I 

Malta Bend R-V 
Maplewood-Richmond Heights 
Marceline R-V 

Maries Co. R-II 
Marshfield R-I 

Maryville R-II 

Mcdonald Co. R-I 
MEADOW HEIGHTS R-II 
Mehlville R-IX 

Miller R-II 

Missouri 

Mo Schls For The Sev Disabled 
Mo School For The Blind 
Monett R-I 

Monroe City R-I 
Montgomery Co. R-II 
Morgan Co. R-II 

Mound City R-II 

Neosho R-V 

Nevada R-V 

New Franklin R-I 

New Haven 

Niangua R-V 

North Callaway Co. R-I 
North Mercer Co. R-II 
North Nodaway Co. R-VI 
North Platte Co. R-| 
North St. Francois Co. R-I 
Northeast Vernon Co. R-I 
Northwest R-I 

Norwood R-I 

Oak Grove R-VI 

Odessa R7 Schools 
Orchard Farm School District 
Oregon-Howell R-II 
Otterville R-VI 

Ozark R-VI 


Palmyra R-I 

Parkway C-2 
Pattonville R-II 
Pemiscot Co. Spec. Sch. Dist. 
Perry Co. 32 

Phelps Co. R-III 

Pierce City R-VI 

Pike Co. R-III 

Pleasant Hill R-II| 
Poplar Bluff R-| 
Putnam Co. R-I 

Puxico R-VIII 

Ralls Co. R-II 
Raymore-Peculiar R-II 
Raytown C-2 

Republic R-II 

Richards R-V 

Richland R-I 

Richmond R-Xvi 
Ritenour 

Rockwood R-VI 

Rolla 31 

Salem R-80 

Sarcoxie R-II 

School Of The Osage 
Schuyler Co. R-| 
Scotland Co. R-I 

Scott City R-| 

Scott Co. Central 

Scott Co. R-IV 

Sedalia 200 

Seymour R-II 

Shelby Co. R-IV 
Sheldon R-VIII 

Shell Knob 78 
Sherwood Cass R-VIII 
Sikeston R-6 

Silex R-I 

SKYLINE R-II 

Smithton R-VI 

South Callaway Co. R-II 
South Harrison Co. R-II 
South Holt Co. R-| 
SOUTH NODAWAY CO. R-IV 
Southern Boone Co. R-I 
Southland C-9 
Southwest R-V 


Specl. Sch. Dst. St. Louis Co. 
Springfield R-XII 
St. Charles R-VI 
St. Clair R-XIII 

St. James R-I 

St. Louis City 
Stockton R-I 
Strafford R-VI 
Sullivan 

Sunrise R-IX 
Tipton R-VI 
Trenton R-IX 
Twin Rivers R-X 
Union R-Xi 
Union Star R-II 
University City 
Valley Park 
VAN-FAR R-I 
Verona R-VII 
Warren Co. R-III 
Warrensburg R-VI 
Waynesville R-VI 
Webster Groves 
West Plains R-VII 
Willard R-II 
Willow Springs R-IV 
WINDSOR C-1 
Winfield R-IV 
Winona R-III 
Wright City R-II 
Zalma R-V 


Appendix R-— MAP-A ELA and MA Test Blueprints 


Page 36 of 39 


'-)) DYNAMIC 


LEARNING MAPS 


DLM Mathematics Integrated Assessment Model 
2014-15 Blueprint 


In this document, the “blueprint” refers to the pool of available Essential Elements (EEs) and the 
requirements for coverage within each conceptual area. A general description of the content covered is 
provided for each grade. The specific options and minimum expectations for each student’s assessment 
are provided with each table. Educators should consult their state department of education for 
additional guidance on selecting content. 


The specific EEs available in each grade are listed in tables beginning on the next page. EEs are organized 
according to conceptual area. 


Major Claims and Conceptual Areas in Mathematics 


Major Claim Conceptual Area 
Understand number structures (counting, place 
M.C1.1 
value, fraction) 


Compare, compose, and decompose numbers and 
sets 


understanding of number sense. mca — —— 
M.C13 Calculate accurately and efficiently using simple 
a arithmetic operations 


Students demonstrate 
increasingly complex M.C1.2 


Students demonstrate M.C2.1 Understand and use geometric properties of two- 
increasingly complex spatial —_ and three-dimensional shapes 


geometric principles. a volume 
increasingly complex —_ units of measure 
data, and analytic procedures. 


Students solve increasingly 


; M.C4.1 Use operations and models to solve problems 
complex mathematical problems, 


making productive use of algebra 
and functions. 


M.C4.2 Understand patterns and functional thinking 


Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 1 of 9 


Grade 3: Available Essential Elements and minimum expectation for each student's assessment 


Claim Conceptual ee 
Area 


1 Students demonstrate increasingly complex understanding of number sense. 
Choose two EEs from Claim 1 in different conceptual areas, i.e., one EE in C1.1 and one EE in C1.3. 
M.C1.1 Demonstrate understanding of place value to tens. 
Count by tens using models such as objects, base ten blocks, or money. 
3.NF13 NF. 7 3 Differentiate a fractional part from a whole. 


M.C1.3 3.0A.4 ove addition and subtraction problems when result is unknown, limited to operands and results within 


? Students demonstrate Tania rxoyaal e)(=)@y oy] dre] ma=t-kxe)aliarcaeslalemelale(=)ecie-lareli aye) m-<-\e)tel-1ea(om e)elacell o) (=p 
All students are assessed on the EE in Claim 2. 
3 Students demonstrate Increasingly complex understanding of measurement, data, and analytic procedures. 
Choose two EEs from Claim 3. 
M.C3.1 Tell time to the hour on a digital clock. 
Measure length of objects using standard tools, such as rulers, yardsticks, and meter sticks 
4 Students solve increasingly complex mathematical problems, making productive use of algebra and functions. 
Choose one EE from Claim 4. 
M.C4.1 Use repeated addition to find the total number of objects and determine the sum. 
Solve one-step real world problems using addition or subtraction within 20. 


M.C4.2 3.0A.9 Identify arithmetic patterns. 


Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 2 of 9 


Grade 4: Available Essential Elements and minimum expectation for each student's assessment 


Claim Conceptual 
Area 


Students demonstrate increasingly complex understanding of number sense. 
Choose two EEs from Claim 1 in different conceptual areas. 
Identify models of one half (1/2) and one fourth (1/4). 
Differentiate between whole and half. 
M.C1.2 Compare whole numbers to 10 using symbols (<, >, =). 
Round any whole number 0-30 to the nearest ten. 

? Students demonstrate increasingly complex spatial reasoning and understanding of geometric principles. 

Choose two EEs from Claim 2 in different conceptual areas. 


Recognize parallel lines and intersecting lines. 
4.MD.5 Recognize angles in geometric shapes. 
4.MD.6 Identify angles as larger and smaller. 


M.C2.2 4.MD.3 Determine the area of a square or rectangle by counting units of measure (unit squares). 


3 Students demonstrate Increasingly complex understanding of measurement, data, and analytic procedures. 
Choose two EEs from Claim 3 in different conceptual areas. 
Tell time using a digital clock. Tell time to the nearest hour using an analog clock. 
Measure mass or volume using standard tools. 
Identify coins (penny, nickel, dime, quarter) and their values. 
4 Students solve increasingly complex mathematical problems, making productive use of algebra and functions. 
Choose two EEs from Claim 4 in different conceptual areas. 
Demonstrate the connection between repeated addition and multiplication. 
Solve one-step real-world problems using addition or subtraction within 100. 


M.C4.2 4.0A.5 Use repeating patterns to make predictions. 


M.C1.1 


M.C2.1 


M.C3.1 


M.C4.1 


Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 3 of 9 


Grade 5: Available Essential Elements and minimum expectation for each student's assessment 


Claim Conceptual . ae 
Area 


1 Students demonstrate increasingly complex understanding of number sense. 
Choose three EEs from Claim 1 in at least two different conceptual areas. 
M.C1.1 Identify models of halves (1/2, 2/2) and fourths (1/4, 2/4, 3/4, 4/4). 
5.NF.2 Identify models of thirds (1/3, 2/3, 3/3) and tenths (1/10, 2/10, 3/10, 4/10, 5/10, 6/10, 7/10, 8/10, 9/10, 
10/10). 
M.C1.2 Compare numbers up to 99 using base ten models. 
Compare whole numbers up to 100 using symbols (<, >, =). 
Round two-digit whole numbers to the nearest 10 from 0O—90. 
M.C1.3 Multiply whole numbers up to 5x5. 
Illustrate the concept of division using fair and equal shares. 
pani AeMmOnctrate Renee rey aal e)(=@y y-1 ete] ma=t-kxe)aliarsaeslaleMmelale(=)acie-larellay-aeymx=\e)eai=) 4a (om ela lavalle)(=cm 
pant one EE from Claim 2. 


M.C2.1 5.G.1-4 Sort two-dimensional figures and identify the attributes (angles, number of sides, corners, color) they 
have in common. 


nies MD. /5.MD.3 Identify common three-dimensional shapes. 
| 5.MD.4-5 | Determine the volume of a rectangular prism by counting units of measure (unit cubes). 
3 Student ro CeVanloyariae- nem laraasrciial-aMmoelanlel(-mulaelsleae-lalellar-melmont=r-kielactea=)a) mmer-ie-Par-lalem-lel-)\adcom e)aelactele|q-m 
Choose two EEs from Claim 3 in different conceptual areas. 
M.C3.1 Tell time using an analog or digital clock to the half or quarter hour. 
Use standard units to measure weight and length of objects. 
Indicate relative value of collections of coins. 
4 Students solve mcreseinahy Keyan} e)(= Mattei aar-learelacerlm eleele)(=lescpmat-]dlalca elgole(Ucod\\(-MUcy-Me) m-|\:4-]e]e-B-lalemivlacad (eae 
All students are assessed on the EE in C4.2. 
M.C4.2 5.0A.3 Identify and extend numerical patterns. 


Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 4 of 9 


Grade 6: Available Essential Elements and minimum expectation for each student's assessment 


Claim Conceptual a 
Area 


1 Students demonstrate increasingly complex understanding of number sense. 
Choose two EEs from Claim 1 in different conceptual areas. 
M.C1.2 Compare the relationships between two unit fractions. 
6.NS.5-8 Understand that positive and negative numbers are used together to describe quantities having 
opposite directions or values (e.g., temperature above/below zero). 


M.C1.3 6.NS.2 Apply the concept of fair share and equal shares to divide. 
6.NS.3 Solve two-factor multiplication problems with products up to 50 using concrete objects and/or a 
calculator. 


? Students demonstrate increasingly complex spatial reasoning and understanding of geometric principles. 


Choose one EE from Claim 2. 
M.C2.2 Solve real-world and mathematical problems about area using unit squares. 
Solve real-world and mathematical problems about volume using unit cubes. 
3 Students demonstrate Increasingly complex understanding of measurement, data, and analytic procedures. 


Choose one EE from Claim 3. 
4 Students solve increasingly complex mathematical problems, making productive use of algebra and functions. 
Choose two EEs from Claim 4. 
M.C4.1 Identify equivalent number sentences. 
Apply the properties of addition to identify equivalent numerical expressions. 
Match an equation to a real-world problem in which variables are used to represent numbers. 


Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 5 of 9 


Grade 7: Available Essential Elements and minimum expectation for each student's assessment 


Claim Conceptual a 
Area 


Students demonstrate increasingly complex understanding of number sense. 
Choose three EEs in Claim 1; at least one in C1.1 and at least one in C1.3. 
7. oe a 2.c-d Express a fraction with a denominator of 10 as a decimal. 


M.C1.1 


7.RP.1- /7.RP.1-30 Use a ratio to model or describe a relationship. 
M.C1.2 7.NS.3 Compare quantities represented as decimals in real world examples to tenths. 
M.C1.3 7.NS.1 Add fractions with like denominators (halves, thirds, fourths, and tenths) with sums less than or equal 
to one. 
7.NS.2.a Solve multiplication problems with products to 100 


Solve division problems with divisors up to five and also with a divisor of 10 without remainders 
? Students demonstrate increasingly complex spatial reasoning and understanding of geometric principles. 

Choose two EEs in Claim 2 in different conceptual areas. 
Match two similar geometric shapes that are proportional in size and in the same orientation. 


Recognize geometric shapes with given conditions. 
Recognize angles that are acute, obtuse, and right. 


M.C2.2 Determine the perimeter of a rectangle by adding the measures of the sides. 


3 Students demonstrate Increasingly complex understanding of measurement, data, and analytic procedures. 
Choose one EE from Claim 3. 

Compare two sets of data within a single data display such as a picture graph, line plot, or bar graph. 

Describe the probability of events occurring as possible or impossible. 

4 Students solve increasingly complex mathematical problems, making productive use of algebra and functions. 


M.C2.1 


M.C3.2 


Choose one EE from Claim 4. 


M.C4.1 7.EE.1 Use the properties of operations as strategies to demonstrate that expressions are equivalent. 
M.C4.2 7.EE.2 Identify an arithmetic sequence of whole numbers with a whole number common difference. 


Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 6 of 9 


Grade 7: Available Essential Elements and minimum expectation for each student's assessment 


i Conceptual — 
Claim | comeepmt ee eserption 
Area 


1 Students demonstrate increasingly complex understanding of number sense. 
Choose two EEs in Claim 1 in different conceptual areas. 


M.C1.1 8.NS.2.a Express a fraction with a denominator of 100 as a decimal. 
M.C1.2 8.NS.2.b Compare quantities represented as decimals in real-world examples to hundredths. 


M.C1.3 Identify the meaning of an exponent (limited to exponents of 2 and 3). 
or equal to one. 
? Students demonstrate increasingly complex spatial reasoning and understanding of geometric principles. 
Choose two EEs in Claim 2 in different conceptual areas. 
Recognize translations, rotations, and reflections of shapes. 


Identify shapes that are congruent. 


Identify similar shapes with and without rotation. 


Compare any angle to a right angle and describe the angle as greater than, less than, or congruent to 
a right angle. 


(limited to perimeter and area of rectangles and volume of rectangular prisms). 
3 Students demonstrate Increasingly complex understanding of measurement, data, and analytic procedures. 
All students are assessed on the EE from C3.2. 


M.C3.2 8.SP.4 Construct a graph or table from given categorical data and compare data categorized in the graph or 
table. 


4 Students solve increasingly complex mathematical problems, making productive use of algebra and functions. 


M.C2.1 


Choose two EEs from Claim 4. 


M.C4.1 8.EE.7 Solve simple algebraic equations with one variable using addition and subtraction. 


M.C4.2 8.EE.2 Identify a geometric sequence of whole numbers with a whole number common ratio. 
8.F.1-3 Given a function table containing at least 2 complete ordered pairs, identify a missing number that 
8.F.4 


completes another ordered pair (limited to linear functions). 
8F4 | Determine the values or rule of a function using a graph or a table. 


Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 7 of 9 


High School: Available Essential Elements and minimum expectation for each student’s assessment 


Claim | Conceptual Available | Available | Available 
Area Description Math 9 Math10 | Math11 
and 12 


Choose a minimum of six EEs across a minimum of three Claims (see next page for Claim 4). 


M.C1.3 N-CN.2.a Use the commutative, associative, and distributive properties to add 
subtract, and multiply whole numbers 


decimals, using models when needed. 

whole numbers, using models when needed. 
N-RN.1 
S-CP.1-5 


S-IC.1-2 Determine the likelihood of an event occurring when the outcomes are 
equally likely to occur 
G-CO.4-5 Given a geometric figure and a rotation, reflection, or translation of 
that figure, identify the components of the two figures that are 


congruent. 


G-CO.6-8 Identify corresponding congruent and similar parts of shapes. 


G-MG.1-3 Use properties of geometric shapes to describe real-life objects. 


M.C2.2 G-GPE.7 


3 M.C3.1 N-Q.1-3 Express quantities to the appropriate precision of measurement. 


M.C3.2 S-ID.1-2 Given data, construct a simple graph (table, line, pie, bar, or picture) 
S-ID.3 
S-ID.4 


and interpret the data. 
Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 8 of 9 


M.C2.1 G-CO.1 Know the attributes of perpendicular lines, parallel lines, and line 
segments; angles, and circles 


Find perimeter and area of squares and rectangles to solve real-world 
problems. 


Interpret general trends on a graph or chart. 
Calculate the mean of a given data set (limit the number of data points 


to fewer than five). 


Area Description Math 9 Math10 | Math11 
4 ie 
it to solve a real-world problem. 
A-SSE.1 Identify an algebraic expression involving one arithmetic operation to 
Ten 
Solve simple algebraic equations with one variable using multiplication 
and division. 


F-BF.2 Determine an arithmetic sequence with whole numbers when 
provided a recursive rule 


change and interpret which is faster/slower, higher/lower, etc. 
functions increase by equal amounts over equal intervals. 


F-BF.1 Select the appropriate graphical representation (first quadrant) given a 
situation involving constant rate of change 


M.C4.2 A-REI.10-12 | Interpret the meaning of a point on the graph of a line. 
A-SSE.4 Determine the successive term in a geometric sequence given the 
common ratio 


Dynamic Learning Maps™ | 2014-15 Mathematics Integrated Blueprint Page 9 of 9 


DYNAMIC 


LEARNING MAPS 


DLM English Language Arts Integrated Assessment Model 
2014-15 Blueprint 


In this document, the “blueprint” refers to the pool of available Essential Elements (EEs) and 
the requirements for coverage within each conceptual area. A general description of the 
content covered is provided for each grade. The specific options and minimum expectations for 
each student’s assessment are provided with each table. Educators should consult their state 
department of education for additional guidance on selecting content. 


The specific EEs available in each grade are listed in tables beginning on the next page. EEs are 
organized according to conceptual area. 


Major Claims and Conceptual Areas in ELA 


Major Claim Conceptual Area 
ELA.C1.1 | Determine critical elements of text 
Students can comprehend text in ELA.C1.2 | Construct understandings of text 


increasingly complex ways FLAC13 Integrate ideas and information from 
~~ | text 


ELA.C2.1 


Students can produce writing for a — 
, Integrate ideas and Information in 
range of purposes and audiences ELA.C2.2 a 
writing 
, Use language to communicate with 
Students can communicate for a ELA.C3.1 sihere pa oe 


range of purposes and audiences 

Poe ELA.C3.2 | Clarify and contribute in discussion 
Students can investigate topics and _| ELA.C4.1 
present information ELA.C4.2 | Collaborate and present ideas 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 1 of 9 


Grade 3: Available Essential Elements and minimum expectation for each student’s assessment 


Conceptual ee DESCRIPTION 
Area 


ELA.C1.1 Choose at least three EEs, including at least one RL and one RI. 


EE.RI.3.2 Identify details in a text. 


EE.RI.3.3 Order two events from a text as "first" and "next". 


EE.RI.3.5 With guidance and support, use text features including headings and key words to locate information in a 
text. 


ELA.C1.2 Choose two EEs in C1.2 (L, RL or RI) — EEs must be from different strands, i.e. RL and L, not RL and RL. 


EE.RL.3.4 Determine words and phrases that complete literal sentences in a text. 
EE.RI.3.4 Determine words and phrases that complete literal sentences in a text. 
EE.RI.3.8 Identify two related points the author makes in an informational text. 


EE.L.3.5.a Determine the literal meaning of words and phrases in context. 
EE.L.3.5.c Identify words that describe personal emotional states. 


ELA.C1.3 Choose at least one EE (RL or RI). 
EE.RL.3.9 Identify common elements in two stories in a series. 
EE.RI.3.9 Identify similarities between two texts on the same topic. 
ELA.C2.1 All students are assessed in both of these EEs through the writing assessment. 
aa EE.W.3.2.a Select a topic and write about it including one fact or detail. 


EE.W.3.4 With guidance and support produce writing that expresses more than one idea. 


Answer who and what questions to demonstrate understanding of details in a text. 
Associate details with events in stories from diverse cultures. 

Identify the feelings of characters in a story. 

Determine the beginning, middle, and end of a familiar story with a logical order. 
Answer who and what questions to demonstrate understanding of details in a text. 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 2 of 9 


Grade 4: Available Essential Elements and minimum expectation for each student’s assessment 


Conceptual aoe 
Area 


ELA.C1.1 Choose at least three EEs in C1.1, including at least one RL and one RI. 


Use details from the text to recount what the text says. 

Use details from the text to describe characters in the story. 
Identify elements that are characteristic of stories. 

Identify explicit details in an informational text. 

Identify the main idea of a text when it is explicitly stated. 


EE.RI.4.3 Identify an explicit detail that is related to an individual, event or idea in a historical, scientific, or 
technical text. 


EE.RI.4.5 Identify elements that are characteristic of informational texts. 


ELA.C1.2 

Identify the theme or central idea of a familiar story, drama or poem. 
Determine the meaning of words in a text. 
Identify the narrator of a story. 
Determine meaning of words in text. 
Identify one or more reasons supporting a specific point in an informational text. 
Demonstrate an understanding of opposites. 

| CC EE.RIL4.9 __—| Compare details presented intwotextsonthe same topic, 


EE.L.4.2.a Capitalize the first word in a sentence. 
EE.L.4.2.d Spell words phonetically, drawing on knowledge of letter-sound relationships, and/or common spelling 
patterns 


EE.W.4.2.b List words, facts, or details related to the topic. 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 3 of 9 


Grade 5: Available Essential Elements and minimum expectation for each student’s assessment 


Conceptual ea 
Area 


ELA.C1.1 Choose at least two EEs in C1.1, including at least one RL and one RI 
EE.RL.5.1 Identify words in the text to answer a question about explicit information. 
EE.RI.5.1 Identify words in the text to answer a question about explicit information. 


EE.RI.5.5 Determine if a text tells about events, gives directions, or provides information on a topic. 


EE.RI.5.7 Locate information in print or digital sources. 
ELA.C1.2 Choose three EE’s in C1.2 (L, RL, or Rl) — EEs must be from at least two different strands 


EE.RI.5.8 Identify the relationship between a specific point and supporting reasons in an informational text. 
EE.L.5.4.a Use sentence level context to determine which word is missing from a content area text. 
EE.L.5.5.c Demonstrate understanding of words that have similar meanings. 


ELA.C1.3 Choose at least one EE in C1.3 (RL or Rl) 


EE.RL.5.3 Compare two characters in a familiar story. 
EE.RL.5.5 Identify story element that undergoes change from beginning to end. 
EE.RL.5.9 Compare stories, myths, or texts with similar topics or themes. 


Identify the central idea or theme of a story, drama or poem. 
Determine the intended meaning of multi-meaning words in a text. 
Determine the point of view of the narrator. 
Identify the main idea of a text when it is not explicitly stated. 
Determine the meanings of domain-specific words and phrases. 

| ELALCL3 


EE.RI.5.3 Compare two individuals, events or ideas in a text. 


EE.RI.5.9 Compare and contrast details gained from two texts on the same topic. 


ELA.C2.1 All students are assessed in both of these EEs through the writing assessment. 


EE.W.5.2.b Provide facts, details, or other information related to the topic. 
EE.W.5.2.a Introduce a topic and write to convey information about it including visual, tactual, or multimedia 
information as appropriate. 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 4 of 9 


Grade 6: Available Essential Elements and minimum expectation for each student’s assessment 


a 
| CL EE.RI.6.5 __| Determine how the title fits the structure ofthetext; 


EE.RL.6.6 Identify words or phrases in the text that describe or show what the narrator or speaker is thinking or 
feeling. 


EE.RL. cca 3 Can identify ay a character responds to a a in story. 
EE.RL. eee eS 5 Determine the structure of a text (e.g., story, poem, or drama). 
EE.RI. aa 3 Identify a detail that elaborates upon individuals, events, or ideas introduced in a text. 


EE.RI. EE.RIL6.9 — | 9 Compare and contrast how two texts describe the same event. 


ELA.C2.1 All students are assessed all three of these EEs through the writing assessment. 


EE.L. )EE.L.6.2.b | 2. )EE.L.6.2.b | Spell untaught words phonetically, drawing on letter-sound relationships and common spelling patterns. 
EE.W.6.2.a Introduce a topic and write to convey ideas and information about it including visual, tactual, or multimedia 
information as appropriate. 


PO E.W..6.2.b Provide facts, details, or other information related to the topic. 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 5 of 9 


Grade 7: Available Essential Elements and minimum expectation for each student’s assessment 


Conceptual — 
D 
ELA.C1.1 All students are assessed in this EE for C1.1 


PERIZ. Determine how a fact, step, or event fits into the overall structure of the text. 
ELA.C1.2 Choose at least three EEs in C1.2 (at least one RL and one RI) 


Analyze text to identify where information is explicitly stated and where inferences must be drawn. 
Identify events in a text that are related to the theme or central idea. 
Determine the meaning of simple idioms and figures of speech as they are used in a text. 
Analyze text to identify where information is explicitly stated and where inferences must be drawn. 
Determine two or more central ideas in a text. 
Determine how words or phrases are used to persuade or inform a text. 
Determine an author's purpose or point of view. 
EE. RI. 7. TEERI7Z8. scldulluls Bows a ea or reason fits into the overall structure of an informational text. 


perenne a two or more story elements are related. 
EE. RL. 7. — Compare the structure of two or more texts (e.g., stories, poems, or dramas). 
EE.RI.7. eens Determine how two individuals, events or ideas in a text are related. 


EE.RI.7. PEE.RIL7.9 Compare and contrast how different texts on the same topic present the details. 
EE.L.7. (EELy2e | a /EE.L7.2.a | Use end punctuation when writing a sentence or question. 
Spell words phonetically, drawing on knowledge of letter-sound relationships and/or common spelling 
patterns. 
EE.W.7.2.a Introduce a topic and write to convey ideas and information about it including visual, tactual, or 
multimedia information as appropriate. 
Provide facts, details, or other information related to the topic. 


Ps ELW.7.2. Select domain-specific vocabulary to use in writing about the topic. 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 6 of 9 


Grade 8: Available Essential Elements and minimum expectation for each student’s assessment 


Conceptual _ 
D t 
ELA.C1.1 All students are assessed in this EE for C1.1 


PERIL. Locate the topic sentence and supporting details in a paragraph. 
ELA.C1.2 Choose at least EES EEs in C1.2 (L, ba or RI) — EEs must EE from at least two different strands. 


EE.RI. — 6 — an author's purpose or point of view and identify emles from text to that describe or support 


EE.RI. a 8 ean the argument made by an author in an informational text. 
EE.L.8.5.a Demonstrate understanding of the use of multiple meaning words. 
ELA.C1.3 Choose at least two EEs in C1.3, including at least one RL and one RI. 


Identify which incidents in a story or drama lead to subsequent action. 
EE. RL. —— 5 Compare and contrast the structure of two or more texts. 
EE.RL. eA 9 Compare and contrast themes, patterns of events, or characters across two or more stories or dramas. 
EE.RI. ae 3 Recount events in the order they were presented in the text. 


EE.RI. -EE.RI8.9 | 9 Identify where two different texts on the same topic differ in their interpretation of the details. 


Write one or more facts or details related to the topic. 
 Feewane— EE. W. 8. 2. C Write complete thoughts as appropriate. 
EE.W.8.2. EE.W.8.2.d | Use domain specific vocabulary related to the topic. 


EE.W. EE.W.8.2.f | 2. EE.W.8.2.f | Provide a closing. 
EE.W.8.2.a Introduce a topic clearly and write to convey ideas and information about it including visual, tactual, or 
multimedia information as appropriate. 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 7 of 9 


High School: Minimum expectation for each student’s assessment in Grades 9-10 and Grades 11-121 


Conceptual 
Area 


ELA.C1.2 Choose one EE in C1.2 (L, RL or RI). 


EE.RL.9-10.1 Determine which citations demonstrate what the text says explicitly as well as inferences drawn from 
EE.RL.9-10.2 
EE.RL.9-10.4 
EE.RL.11-12.1 
EE.RL.11-12.2 
EE.RL.11-12.4 
EE.RI.9-10.1 
EE.RI.9-10.2 
EE.RI.9-10.4 
EE.RI.9-10.5 
EE.RI.9-10.8 
EE.RI.11-12.1 
EE.RI.11-12.2 
EE.RI.11-12.4 
EE.RI.11-12.8 
EE.RI.11-12.5 
EE.L.9-10.4.a 
EE.L.9-10.5.b 


the text. 

Recount events related to the theme or central idea, including details about character and setting. 
‘ The high school blueprint provides coverage options for students in grades 9-12 to support the various testing requirements in different states in the 
consortium. Each state sets its own policy for which high school grade(s) are appropriate for DLM assessments. 


Description 


Determine the meaning of words and phrases as they are used in a text, including idioms, analogies, 
and figures of speech. 

Analyze a text to determine its meaning and cite textual evidence to support explicit and implicit 
understandings. 

Recount the main events of the text which are related to the theme or central idea. 

Determine how words or phrases in a text, including words with multiple meanings and figurative 
language, impacts the meaning. 

Determine which citations demonstrate what the text says explicitly as well as inferentially. 
Determine the central idea of the text and select details to support it. 

Determine the meaning of words and phrases as they are used in text, including common idioms, 
analogies, and figures of speech. 

Locate sentences that support an author's central idea or claim. 

Determine how the specific claims support the argument made in an informational text. 

Analyze a text to determine its meaning and cite textual evidence to support explicit and implicit 
understanding. 

Determine the central idea of a text; recount the text. 

Determine how words or phrases in a text, including words with multiple meanings and figurative 
language, impacts the meaning of the text. 

Determine whether the claims and reasoning enhance the author's argument in an informational text. 
Determine whether the structure of a text enhances an author's claim. 

Use context to determine the meaning of unknown words. 

Determine the intended meaning of multiple meaning words. 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 8 of 9 


Conceptual ae 
Description 
ome eset 
Le EE.L.11-12.4.a Use context to determine the meaning of unknown words. 
ELA.C1.3 Choose at least three EEs in C1.3 (RL or RI) — including at least one RL and one RI. 


Determine how characters change or develop over the course of a text. 

Identify where a text deviates from a chronological presentation of events. 

Determine how characters, the setting or events change over the course of the story or drama. 
Determine how the author's choice of where to end the story contributes to the meaning. 
Determine logical connections between individuals, ideas or events in a text. 

Determine how individuals, ideas, or events change over the course of the text. 

Compare and contrast arguments made by two different texts on the same topic. 


words. 


EE.W.11-12.2.d Use domain specific vocabulary when writing claims related to a topic of study or text. 
EE.W.11-12.2.f Provide a closing or concluding statement. 


EE.L.11-12.2.b Spell most single-syllable words correctly and apply knowledge of word chunks in spelling longer 
words. 


ELA.C2.2 All students are assessed in all the EEs identified for the appropriate grade level in both conceptual areas in Claim 2. 


EE.W.9-10.2.a Introduce a topic clearly and use a clear organization to write about it including visual, tactual, or 
ene | multimedia information as appropriate. 

EE.W. )EE.W.9-10.2.b 10.2. )EE.W.9-10.2.b Develop the topic with facts or details. 

EE.W. a | 12.2.a Introduce a topic clearly and write an informative or explanatory text that conveys ideas, concepts, and 


a ee including visual, tactual, or multimedia information as appropriate. 
EE.W. )EE.W.11-12.2.b 12.2. )EE.W.11-12.2.b Develop the topic with relevant facts, details, or quotes. 


Dynamic Learning Maps™ | 2014-15 ELA Integrated Blueprint Page 9 of 9 


Appendix S— —_End-of-Course Blueprints 


Page 37 of 39 


Blueprint for ALGEBRA | 


Category Code Target Point Range Of 
Range Emphasis 


Number & HSN-RN.A | The Real Number System 
Quantity HSN-Q Quantities 
HSA-SSE Seeing Structure In Expressions 
HSA-APR Arithmetic With Polynomials And Rational 
Algebra Expressions 35-53% 


HSA-CED Creating Equations 


HSA-REI Reasoning With Equations And Inequalities 


HSF-IF Interpreting Functions 


Functions HSF-BF Building Functions 11-20 28-50% 
HSF-LE Linear, Quadratic And Exponential Models 


Stats & Prob HSS-ID Interpreting Categorical And Quantitative Data 8-15% 
Total 100% 


Performance Event: Each year the performance event may align to any specific conceptual category or to a 
group of them. The Performance Event is worth 10 points. 


Blueprint for ALGEBRA II 


Category Code Target Point Range Of 
Range Emphasis 


HSN-CN The Complex Number System 0-4 0-10% 
Quantity 


HSA-SSE Seeing Structure In Expressions 


HSA-APR Arithmetic With Polynomials And Rational 
Algebra Expressions aaa 


HSA-CED Creating Equations 
HSA-REI 
HSF-IF Interpreting Functions 
Functions HSF-BF Building Functions 45-60% 
HSF-LE Linear, Quadratic And Exponential Models 


HSS-ID Interpreting Categorical And Quantitative Data 
HSS-MD Using Probability To Make Decisions 
Total 


Stats & Prob 


Blueprint for AMERICAN HISTORY 


Reporting Categories Point Range Of 
Range Emphasis 


18%-23% 
14-18 |__35%-45% 


+8, 
18%-23% 
Geography 18%-23% 
otal 8 100% 


Blueprint for BIOLOGY 


Content Strand Point Range Of 

Range Emphasis 

Characteristics and Interactions of Living Organisms 36%-44% 

Changes in Ecosystems and Interactions of Organisms with their Environments 22%-25% 
Scientific Inquiry 
Total 55 | 100% 


Blueprint for ENGLISH | 
Claim Category Big Idea Point Range Of 


Range Emphasis 
Apply reading skills to demonstrate the ability to 
, integrate key ideas and details, interpret and analyze 
R | 1 
eddie Sale the craft and structure of texts, and evaluate the 
knowledge and ideas found in literary texts 
Apply reading skills to demonstrate the ability to 
Reading Sein ab integrate key ideas and details, interpret and analyze 
the craft and structure of texts, and evaluate the 
knowledge and ideas found in informational text 


Writing Claim 2a Demonstrate the ability to produce a variety of text 
types and purposes 
Demonstrate a command of the conventions of 
Writing Claim 2b standard English, appropriate grade-level acquisition 
of vocabulary 


Blueprint for ENGLISH Il 
Claim Category Big Idea Point Range Of 


Apply reading skills to demonstrate the ability to 
: integrate key ideas and details, interpret and analyze 
Readin Claim 1a 
the craft and structure of texts, and evaluate the 
knowledge and ideas found in literary texts 
Apply reading skills to demonstrate the ability to 
integrate key ideas and details, interoret and analyze 
Reading Claim 1b y P y 
the craft and structure of texts, and evaluate the 
knowledge and ideas found in informational text 


Range Emphasis 


D trate the ability to produce a variety of text 
Writing Claim 2a pert gen anne y 
types and purposes 
Demonstrate a command of the conventions of 
Writing Claim 2b standard English, appropriate grade-level acquisition 
of vocabulary 


Blueprint for GEOMETRY 


Category Code Target Point Range Of 
Range Emphasis 


Geometr 34-40 85-100% 


HSG-MG Linear, Quadratic And Exponential Models 


HSS-CP Conditional Probability And The Rules Of Probability 
Stats & Prob 


HSS-MD Using Probability To Make Decisions 
Total 


Blueprint for GOVERNMENT 


Content Strand Point Range Of 
Range Emphasis 


Principles of Constitutional Democracy 45%-55% 
45%-55 


Principles and Processes of Governance Systems 


Total 100% 


Blueprint for PHYSICAL SCIENCE 


Content Strand Point Range Of 
Range Emphasis 


Properties And Principles Of Matter And Energy 25-30 55-66% 
Properties And Principles Of Force And Motion 15-20 33-44% 


ota 100% 


Appendix T— Assessment Testing Windows 


Page 38 of 39 


Page | of 6 


The Assessment section provides professional services related to the Missouri Assessment Program 
(MAP) and the National Assessment of Educational Progress (NAEP). 


The Assessment Section manages test development, on-going test maintenance, and oversees the test 
administration for four statewide, large-scale assessments. The MAP assessments test students’ progress 
toward mastery of the Missouri Show-Me Standards. 


Assessment Calendar 
Assessment Dates Event 
Personal June 9, 2014 to August 
ee 29, 2014 Summer 2014 Window 


ACCESS for september 15, 2014 to DTC Gathers ELL Roster and Tier Placement 


ELLs September 26, 2014 
EOC DUE September 19, Fall 2014 First Precode - Students Available in iTester 
2014 9/29 - Students May Begin Testing 10/6 


ACCESS for september 29, 2014 to Ordering ACCESS Test Materials 


ELLs October 31, 2014 

ACCESS for September 29, 2014 to , 

FLLs November 7, 2014 Precoding for ACCESS for ELLs 
Personal October 6, 2014 to , 

Finance January 23, 2015 palbqnass WINGOW 


October 6, 2014 to 


January 23, 2015 Fall 2014 Window 


EOC 


Fall 2014 Second Precode - Students Available in 


EOC DUE November 7, 2014 iTester 11/17 - Students May Begin Testing 11/24 


http://dese.mo.gov/college-career-readiness/assessment 9/25/2014 


Assessment 


ACCESS for 
ELLs 


MAP-A 


MAP-A 


ACCESS for 
ELLs 


NAEP 


EOC 


MAP-A 


ACCESS for 
ELLs 


EOC 


Personal 
Finance 


ACCESS for 
ELLs 


EOC 


ACCESS for 
ELLs 


MAP-A 


Dates 


December 12, 2014 


January 2, 2015 


January 5, 2015 to 
January 30, 2015 


January 12, 2015 to 
March 6, 2015 


January 26, 2015 to 
March 6, 2015 


DUE January 30, 2015 


February 2, 2015 to 
February 27, 2015 


February 20, 2015 


February 23, 2015 to 
May 22, 2015 


February 23, 2015 to 
May 22, 2015 


March 6, 2015 to March 


20, 2015 


DUE March 6, 2015 


March 20, 2015 


March 30, 2015 to May 
22, 2015 


http://dese.mo.gov/college-career-readiness/assessment 


Page 2 of 6 


Event 


Test Materials Arrive In District 


Science - Transfer Student Participation Deadline 


Science - Collection Period One 


ACCESS for ELLs TEST WINDOW 


2015 Testing Window 


Spring 2015 First Precode - Students Available in 
iTester 2/16/15 - Students May Begin Testing 
2/23/15 


Science - Collection Period Two 


Additional ACCESS for ELLs Materials Ordering 
Deadline 


Spring 2015 Window 


Spring 2015 Window 


Districts Pack and Ship ACCESS for ELLs Materials 


Spring 2015 Second Precode - Students Available in 
iTester 3/16/15 - Students May Begin Testing 
3/23/15 


Deadline to Ship ACCESS for ELLs Materials 


English language arts and Mathematics - Dynamic 
Learning Maps (Year-Ends) 


9/25/2014 


Assessment 


NAEP 


Grade-Level 


EOC 


ACCESS for 
ELLs 


Grade-Level 


Dates 


March 30, 2015 to May 
29, 2015 


March 30, 2015 to May 
22, 2015 


DUE April 3, 2015 


May 20, 2015 to May 22, 


2015 


End-of-Course 


e End-of-Course 


e Missouri Learning Standards 


Peer Review Status 


ACCESS 


Event 


TIMSS Testing Window 


Page 3 of 6 


MAP Grade-Level Assessments Window 


Spring 2015 Third Precode - Students Available in 
iTester 4/20/15 - Students May Begin Testing 


4/27/15 


Reports Arrive In District 


Personal Finance NAEP 


2014-2015 LEA Guide To The Missouri Assessment Program 
iTester Administration 


MAP-A 


All states' assessment systems are submitted to the United States Department of Education for Peer 
Review to document that they meet the requirements of the No Child Left Behind Act of 2001. The 
current Peer Review status of all components of the Missouri Assessment Program is as follows: 


Assessment 


Grades 3 - 8 English Language Arts and Mathematics 


Grades 5 and 8 Science 


Algebra I, Biology, and English II 


http://dese.mo.gov/college-career-readiness/assessment 


Status 


Full 
Approval 


Full 
Approval 


Full 
Approval 


9/25/2014 


Appendix U— Sample Student Report 


Page 39 of 39 


Sample Reports 


Individual Student Report 


The Individual Student Report provides information about performance on the End-of-Course 
Assessment, describing the results in terms of four levels of achievement in a content area. It 1s 
used for measuring and reflecting an individual’s student mastery toward post-secondary readiness 
for a content area. It 1s used 1n instructional planning as a point of reference during a parent/teacher 
conference and for permanent record keeping. Other sources of information should be used along 
with this report when determining the student’s areas of strength or need. 


Achievement-level scores describe what students can do in terms of the Course-Level Expectations 
for the content and skills assessed by the End-of-Course Assessment. Students in the Proficient or 
Advanced levels have met the standard. Students in the Below Basic or Basic levels need to work 
on the skills described for their level on pages 8—15, as well as on skills in the next higher level. 


The next page includes a sample of the Individual Student Report. The following areas on the 
sample have been identified to better explain the results that are being reported: 


[A] The heading of the Individual Student Report includes the content area for the results 
being presented. A separate report 1s produced for each content area tested. 


[B] The Student Information section contains the biographic data for the individual student 
taking the assessment. Identifying information for the MOSIS ID, gender, group, building, 
district, and test period are listed. 


[C] The individual student’s results are presented numerically as a three-digit scale score 
with the standard error (SE). An accompanying bar graph illustrates the achievement level 
obtained by the student. Achievement levels (whether Advanced, Proficient, Basic, or 
Below Basic) are based on the scale score ranges listed beneath the Achievement Scores 
heading 1n the table. 


[D] The mean scale scores for the student’s building and district are displayed in the two rows 
below the student’s individual results. The mean scale score, with an associated SE, and the 
bar graph provide a way to view the individual’s results in contrast to the group’s results 
for the content area during the same test period. 


[E] The narrative describes the student performance characteristics corresponding to the level 
of achievement obtained. The text 1s specific to the content area tested. At the bottom of 
the page is the URL, which provides additional information for all of the achievement 
levels for the content area. 


16 


LI 


(; Missouri 


— 
| DEPARTMENT OF ELEMENTARY & SECONDARY 


EDUCATION. 


B Individual Student Report for: 


Jane E Dow 


MOSIS ID: 1536879236 
Gender: F 

Building: Washington HS 
Building Code: 9999 

District: Jefferson SD 
District Code: 999999 


Test Period: Summer 2014 


Jane E Dow 
Achievement Level: 


Proficient 
Students performing at the Proficient level on the Missouri 


English II End-of-Course Assessment demonstrate an 
understanding of the skills and processes identified in the 
Course Level Expectations for English II. They demonstrate 


these skills in reading processes and in responding to both 
fiction and nonfiction texts. In addition to understanding and 


applying the skills at the Basic level, students scoring at the 
Proficient level use a range of strategies to comprehend and 
interpret a variety of texts, demonstrate an understanding of 


literary forms, and apply strategies for accessing and 
summarizing information. They correctly apply the rules 


and conventions of Standard English. 


End-of-Course Assessment 
English I 


Achievement Scores 


Below Basic Basic Proficient | Advanced 
100-179 180-199 200-224 225-250 


inc. vs 2°. 


: 2 
Washington HS fom 


190 
JeffesonSD 


About Achievement Levels 


Below Basic Basic Proficient Advanced 
100-179 180-199 200-224 225-250 


Students demonstrate} Students demonstrate | Students demonstrate | Students demonstrate 
little understand- an incomplete an understanding of | a thorough under- 


ing of the skills and | understanding of the | the skills and pro- standing of the skills 

processes identified | skills and processes | cesses identified in and processes identi- 

in the Course Level | identified in the the Course Level fied in the Course 

Expectations for Course Level Expectations for Level Expectations 

English II. Expectations for English II. for English II. 
English II. 


For more information about achievement levels, please visit the following web site: 
http://dese.mo.gov/college-career-readiness/assessment/end-course/general-resources 


Wioday juepnys jenplAipuy 


